BiteBuddy
Get me something for breakfast, something hearty but cheap
June 2024 — July 2024
Summary
This project is made during the hackjakarta hackathon that I did with 3 other teammates. This hackathon is a part of my preparation before I go to Canada to participate in the hackathons there (see more here). During the 24 hours duration of the hackathon, we ended up making a product called BiteBuddy that is meant to be an extension of Grab's existing app. The demo video for the app can be seen here.
Team Composition
Alisha was the original person who wanted to join the hackathon. She then asked the people in my circle of friends if any of them would like to join. Seeing that joining several hackathons before my departure to Canada was something that I had already planned for a long time, I immediately jumped at this opportunity. My other friends, Akbar and Awe, also decided to join. So that was how the team formed.
The division of role was almost exactly like in GarudaHacks. Me, Awe, and Akbar is the engineer who creates the product that we actually want to create (I'll talk more about further division of labor among us three later). Meanwhile, Alisha is a fusion of the archetypal hustler and hipster.
Developer Diaries
This hackathon had pretty lax rule about doing preparation works before the hackathon started. It only demands hackers that all piece of code that they write for the project is written for the project. But other artifacts of development, such as pre-trained model, UI/UX design, or system design, was allowed to be created before the hackathon started. This is why the discussion on what project are we making and how are we going to execute said project goes much farther back than my previous hackathon, GarudaHacks, which has a much more stringent rule regarding that (see my experience in that hackathon here).
What Are We Making?
This question has been asked from the first meeting that me and my team had in mid-June. In the first meeting, after dealing with the registration of the hackathon, me and my team tried to answer this question by first taking a look at the available tracks of the hackathon. We promptly decided to take on the last track, improving and augmenting Grab's existing offerings with generative AI. There were a lot of ideas being bounced back and forth, but at the conclusion of that first meeting, we managed to settle on food recommender chatbot for the GrabFood service. The reasoning behind the idea was our own anecdotal experience of intense decision paralysis when we are choosing the food to order through the service due to the overwhelming amount of options that are available.
As time went on however, we started to reconsider the idea. Our main concern back then was the fact that a food recommender app would be one of the most popular and common ideas among the participants. This is especially true considering that the idea was included in the idea bank for the hackathon that was created by the organizer. We decided that we need some sort of "twist" on top of the idea to make us stand-out among the other teams who would likely also have that similar idea. We spent a handful of meeting deciding on said twist. I suggested that we use the recommendation not to recommend food, but to recommend restaurants for Grab's other upcoming service, Grab Dine-In. This idea quickly got shot down however because of the much smaller adoption of the service among Grab users.
What we ended up coming to an agreement to is framing the app as an aid to visually impaired users. Alisha came up with the idea after trying out one of iOS' accessibility feature, VoiceOver, on the Grab application. The feature is meant to help those with visual impairments by reading out loud the phone's screen and the applications that the user is currently using. Using the feature on Grab's application was a godawful and hilarious experience, as it often just kept repeating the same food name over and over again.
There weren't that many addition that are needed to gear our original idea to the aforementioned twist. It's still largely the same app, we only had to add voice inputs and outputs to it. What required significant change was in how we market and pitch the app. Instead of trying to solve decision paralysis, we instead brought up the significant number of visually impaired Grab users. It has to be noted here that when we talk about people who are visually impaired, it's not just people with partial and complete blindness. We're also talking about, for example, older people whose vision has experienced substantial degeneration. I'll talk more later on whether or not having this twist was a winning move or not.
How Are We Making That?
We will use Claude Sonnet 3.5 and langgraph for the chatbot. Most of the chatbot's flow (written in langgraph) is dedicated to parsing the user's message and the entire conversation to produce a tangible attributes and description of the food (or beverages) that the user would like. Retrieval augmented generation will, once again, be at the heart of the recommendation system. Those tangible attributes and descriptions will be used to query a vector database that contains all of the food and beverages data. The result will then go through some filtering and processing by the chatbot's remaining flow. The architecture of the app we were working on was heavy on the back-end. The user-facing application itself will just be a web page built with React and other UI libraries.
It goes without saying that I was, once again, placed as a GenAI jockey. I was placed there because of my experience playing with GenAI during my internship at Covena at the time. And so I am responsible for the Akbar took the role of data engineer. He's responsible for collecting food and beverages data from GrabFood (with the means that I won't disclose), as well as managing the vector database that we'll be using. Awe is traditionally a more low-level developer, dealing with OSes and computer networks, but he had to be the front-end developer, developing the user-facing application.
After doing GarudaHacks, one of the key takeaway that I got was that I have to be proficient in the tech stack that I'm using. To make the chatbot that we're aiming to make, just langchain wouldn't do. We need a true stateful agent that can only be made possible with langgraph. Fortunately at that point, I have had plenty of experience dealing with langgraph in my internship at Covena, unlike two weeks prior during GarudaHacks. In addition to that, I also understood the fact that, as is the case in many other things, preparation is one of the most important part of a hackathon. And so, I designed the langgraph flow of the chatbot before the hackathon even started. See the gallery below for that.
Grab Office and the Commute To It
The hackathon took place in Grab's corporate office in South Jakarta. I originally wanted to commute there using the KRL and MRT from my father's house in Tangsel. But, my father offered to just drive me there, which made things a whole lot easier (thanks dad!). I asked to be dropped off in one of the MRT station as my teammates were commuting to the office with MRT.
We then walked together from the station to Grab's office. It took only several hundred meters, but it felt long thanks to Jakarta's scorching heat and humid air. Once we got into the office, we were in awe at how the building looked from the outside since it looked quite unique and futuristic. The inside of the office, at least the part in which the hackathon took place, were also a really comfortable place to work in.
I think this is typical of most hackathon, but it's really neat that we got to get meals for the entire hackathon, as well as snacks. That, coupled with the merch that we got (handheld electric fan, tumblr, shirt, and totebag) made the even still worthwhile to register and pay money for, even if we don't win.
All of the participants were placed in one of the largest office room in the building for the opening ceremony. It was quite intimidating seeing the sheer amount of people we have to compete against. There was a talk conducted by one of the Grab's executives about Grab's vision and whatnot, it was a pretty engaging talk that I enjoyed. It was only then that I realized Grab uses its own map and geographical data instead of outsourcing that to Google Maps. They even created their own hardware for their drivers to collect map data. The executive said that the reason why Grab has their own in-house map data was due to the nature of Southeast Asia's unconventional and unorganized road system.
Off to the Races!
The countdown begins in that morning. We had a very interesting development setup of a virtual private server (VPS) partially rented by Awe that we ssh into via VSCode to do our development. This eliminates one of the major problem of real-time collaboration in programming that I've experienced when working with other people on a project together, unsynchronized files and data. With this somewhat of a crazy set-up, we didn't have to spam git commit to get our changes to our peers' machine. The VPS was also where the app will be deployed.
After I initiated the Express project for the back-end and the myriad of dependencies that I will need to get the project running, I started implementing the "starter" flows that the langgraph graph must go first, the greetings answerer and the chat filter. I installed langsmith early on to get an accurate assessment of how the chatbot is performing the task and a powerful debugging tool to debug the langgraph flow. I was doing this until the evening of the day. I believe one of the reason why I kinda lagged a bit in starting the implementation of the core of the langgraph flow (which is responsible to build the query to the vector database) was because Akbar's database, which uses Weaviate, wasn't really working yet. I remember Akbar was being frustrated by the fact that when queried about fried chicken, the vector database returned an ice cream or something.
Our work up to that evening had to be halted all of a sudden due to a mandatory exercise session from the organizer at 8 PM (no I will not elaborate). Akbar had the most to say about this since the session started just as he was in the "zone" for debugging the buggy search of the vector database. After the session was done, I got to work immediately on the core of the langgraph flow. While I was doing that, Akbar then discovered that he, in his infinite wisdom, had used an embedding for the food data that was not meant to be used for semantic search. At the time I thought that was just hilarious and comical. After switching embedding (which took a while since we had around 4000 or so food data), the vector database was able to give out a much more accurate answer. Akbar abstracted away the database by making a function that do the query to the vector database when given the necessary attributes about the food preference of the user. My responsibility was to implement the langgraph flow that can transform the user's conversation into said attributes.
The core langgraph flow took quite a while to get done, I started at around 9 and I was only done at 3. Looking back at it, I think I could've made the flow faster, but there were some factors that slowed the development. One of them was the fact that I decided to reduce the chatbot's slow time-to-response through two measures:
Implementing paralellization in the part of the flow that does not depend on each other (see a picture of this new flow in the gallery next to the original one).
Using the considerably faster OpenAI's 4o-mini model for the less critical part of the flow, such as the one that formed the final response or one that answered greetings.
Once the overall langgraph flow of the chatbot is done and the chatbot was able to be functional in giving out food recommendations to the delight of all of my teammates, my sleep-deprived self immediately fell asleep on the floor with the pillow that I brought from Tangsel.
Final Countdown
I woke up around 3 or so hours later with my teammates still working. I asked why they did not wake me up earlier and they said that they felt pity because I had been working really intensely before I fell asleep. This didn't matter all that much since the core feature of the chatbot is done. I just needed to test the chatbot, and made appropriate refinement to the result of said test. One of those improvement was the addition of another node in the langgraph flow after the vector database query. The database query was made to produce the top 10 most similar result, and then the langgraph node will choose one food or beverages that best fits the attributes inferred from the user's conversation. The addition of this node prevents many of the obvious mistakes that was made by the chatbot prior to the implementation of the improvement.
Another improvement that I made was the ability of the chatbot to give out vague and non-specific recommendations when the user prompt the chatbot with less specific recommendation question such as "what do you think is the best food to get for a breakfast?". We honestly should've thought of this obvious functionality during one of our first meetings for the hackathon. It wasn't that big of a deal however since it's not a particularly hard functionality to implement.
One or two hours before the time is up, we went to one of the quieter floor higher up in the building to make a mandatory demo video. Here's the result. We were surprised that the app was able to work perfectly fine in the first take of the video that we took.
Alisha had the amazing idea to hand out a pamphlet of a QR code to our app's URL during pitching later. We had the pamflet printed out by a good friend of Alisha, Kak Azkal. She is a senior of ours from the information system and technology major who funnily enough works for Grab in the building where the hackathon took place.
After that, most of the development for the app was sorta done and we were just preparing the necessary deliverables for submission. It was quite notable how we got the app done quite comfortably within the 24 hours mark of the hackathon.
And the time is finally up!
Pitching
After the time is up, we mostly used the time to have lunch and take a rest. We didn't really use the time to prepare our pitch. One thing for sure is that I will have to be the one speaking during pitching, because the pitching must be done in English. That's why we were kind of caught with our pants down when it was revealed that we are the third in line for the pitching in our tracks. We were pitching to one of the EVP of Bukalapak and the principal product manager of Grab.
I didn't really script the pitching, I only kept several talking points in mind. I then used these talking points to make an impromptu pitch. This is how I usually roll as well when I have to make a speech in an MUN, several key points to convey, but mostly impromptu.
In the middle of me doing my pitch, I tried to demo the app by speaking into it. But the app failed to produce an output for some reason. So I had to roll with the punches and continue the pitch even after the app failed. The two judges asked about on why the app isn't being rolled out to everyone instead of only the visually impaired. I tried to clarify that this app is meant to be used by everyone, it's just that the app particularly helps visually impaired people (I'll talk more later on why this is important). Finally, after all is said and done, I handed the judges our pamphlet.
After the pitch, I did a postmortem on the error by checking the langsmith log. It's then revealed that the reason the app didn't work was that in one of the langchain chain in one of the node, Claude Sonnet decided to add comments to the output JSON that it generates, causing the langchain JSON parser to throw an error. That was unfortunate but it was what it was.
Another unfortunate thing that happened was the fact that the judges didn't try the application using the QR code in the pamphlet that we gave them. I knew this because I checked the log in langsmith and didn't find any new logs other than the one from Kak Azkal trying out our application. This is understandable since the judges probably have a very limited time to judge our project. More importantly, to ensure fairness is upheld, it might be that the judging guidelines explicitly forbid the judges to try out the participant's app outside of the pitching of those participants.
After the pitching was concluded the four of us jokingly pondered about skipping the finalist announcement to instead go to Pondok Indah Mall (PIM) to hangout and refresh ourselves. For better or for worse, we ended up not doing that.
Conclusion and Takeaways
The announcement for the finalist was delayed for 1.5 hours. The way that the announcement work is that each track's finalist will be announced sequentially, and each finalist of each track will be given the chance to make a final pitch in front of the stage to be judged by the judges of the final stage (they are different from the judges from the previous stage). Since the announcement for our track was placed last (probably because this was the most popular track), we had to sit through the pitches for the finalist of the previous two tracks, ten pitches in total (because each track has 5 finalists) + QnA from the judges for each of those pitches.
When the announcement for our track came around, we were disappointed to know that we didn't get into the finals. By the time of the announcement for our track, evening was about to come up and thus we decided to only watch one pitch from one of the finalist of our track before heading out. It was a restaurant reservation maker powered by GenAI that is able to automatically make phone calls to multiple restaurants to reserve a place for the user.
After we head out, we pondered about why we didn't make it to the final. We ended up drawing some conclusions about our experience. There were a lot more lessons that I learned from this hackathon compared to GarudaHacks:
Emphasis on uniqueness.
Know your judges.
Quality and soundness over implementation details.
Emphasis on Uniqueness
We later found out that one of the winner for our track was also a chatbot addition to the Grab product, but their product wasn't just limited to GrabFood, they also included GrabRides and GrabCar. What me and my team were taking away from that fact is the following lessons.
In a hackathon with a set of projects A, with A = {a, b, c, d, e, f}, that a ~ b ~ c ~ d !~ e !~ f in terms of their respective core premises, then the first four projects will have to compete against one another in terms of who got the most sound business case, who is the most technically sophisticated, etc. while the latter two projects could be assessed independently to a significantly larger extent. In summary, the more common your product's idea is, the tougher the competition would be. This lesson is echoed by the fact that despite a very high chance of overrepresentation of food recommender chatbot projects among the participants of our track, all of the finalists had different product idea. Said overrepresentation was not reflected in the composition of the finalists likely because the judges only picked one project that did the premise best to got into the final round, which is the aforementioned winner.
Know Your Judges
When the order for the pitching of our track was about to be announced, I was preparing my pitch ready to mention the sophistication of the chatbot that we developed, how the flow was able to handle description of a food that the user would not prefer: "I would like to get something that isn't greasy". But the judges that we got were not engineers, they were an executive and a product manager. That's why we got asked about how the product would be rolled out, instead of the edge cases that the project could handle or the technical feasibility of the product. My team should've geared the pitch accordingly.
Quality and Soundness Over Implementation Details
I already learned this before in GarudaHacks but it got cemented here.
hackjakarta sent us a feedback on our project from the judges a couple of days after the hackathon concluded. From the feedback, it seems obvious from the feedback that the judges thought that we originally pitched our product as something that would only be released for those with visual impairment, before pivoting into releasing it into everyone else when the judges asked about why the product isn't rolled to everyone instead. This was the whole extent of the feedback. There were no comments on why the product failed to produce an output during the demo, the feedback was entirely about the target market of the product.
There are several evaluations that can be derived from the above feedback:
I could have made it clearer to the judge that this is a case in which accessibility features that is particularly helpful for users with certain impairments (in this case, visual impairments) could also benefit the general userbase (see this great article from Cassidy James on the topic, one of the co-founder of elementaryOS).
We picked the wrong "twist" for the original chatbot food recommender that we came up with because users with visual impairments doesn't make up enough of the userbase to justify the existence of this product (would love to see the backlash towards Grab should they decide to admit that publicly).
All in all, I don't know if this preference for sound business over sheer technical coolness (I just came up with that term when writing this and I think it's pretty neat) can really be generalized to most hackathons, or if it's really just something that came down to the background of our particular judges (in which case this and the previous lesson is really one and the same). What I do know is that this has happened twice, which increased the significance of this lesson.