Integral Review

Welcome to my personal blog! I use it to share what I'm currently learning or thinking about, usually on topics related to technology, business, and health.

Reviewing OpenAI's Deep Research: A good starting point

Yesterday, OpenAI released its latest agent: Deep Research. I spent a few hours trying it out. Here are my thoughts, and some thoughts on what this means for tools like Perplexity or even OpenAI's Operator.

Deep Research has 3 characteristics that make it highly interesting and differentiated compared to OpenAI's other tools:

  1. o3 model: Under the hood, Deep Research uses the o3 model which has been teased for a while by OpenAI. Its performance is a significant improvement compared to their first reasoning model, o1, but the cost is supposedly still too high to allow for a full release. Nevertheless, Deep Research already at least partially uses it and this is absolutely felt in the quality of the answers as we'll see below.
  2. Web access: Deep Research will do web searches before it answers and is also able to take some browser actions. It's hard to know precisely what the model does on those webpages and how it navigates, but this access to numerous online sources for a single response already helps vastly improve the quality of the output.
  3. A long detailed output: o1 models are very comfortable providing long outputs and Deep Research continues that path. The output makes the necessary wait time acceptable.

Those already help provide vastly better results than ChatGPT even with web access. But the outcome is still limited by some factors. The most important one is that the model is web-only, so it doesn't have access to research papers unless those are in PDFs online (which sadly they often aren't) or books. This is a hard problem to solve as OpenAI is already in hot water regarding the source of its training data, but I do hope some solution is found there.

Market research

Here are some of the tests I made.

I tried a similar regarding to the one I made on Operator a couple of days ago and asked to get some market research on flash cards software similar to Anki. The output was miles above what I had before. But it still was far from perfect:

Note the sources for pricing data and that some is even missing (or weirdly TBD)


For the pricing, for example, we can see that the source is almost never the product's own website and instead is always third parties. Unsurprisingly, some information is wrong and some is missing. I would be curious to understand how Deep Research is built, but my guess is that it accumulates the sources and then generates the output. But it doesn't take a step back to ask if it's missing critical information (for example, pricing on a product) and it doesn't double-check the validity of sources. Any researcher tasked with the same assignment would have made some final verifications once a first draft of the report had been produced. I'm sure those problems will be fixed, but for the time being it does negatively impact the quality of the output.

If you're curious here's the full output.

Health-related research

Another test query I did was ask for the best supplements one could take to optimize for general health and longevity. I thought this would be an interesting query as it is a very broad topic, with lots of noise, and commercial interests. Here is the full output.

Overall, I found the output in this case quite disappointing. It seems like it can be partially explained by the approach Deep Research took.

Those are the first few steps taken.


Deep Research did the most naïve search to start its process and then spent the rest of the effort getting additional information on each individual supplement. For a quick search, this might be OK, but this isn't what I would expect from a research analyst. Again, it would have made more sense to take a step back: What are the main components of good physical health once one grows older? What are the main diseases causing early deaths? Then one would start a process of figuring out which supplements could have helped and finally searching for research to back up those claims.

Note that Deep Research still wouldn't have had access to most of those research papers aside from the abstract, but the output would still have been better.

Reviewing Deep Research: Final thoughts

Deep Research is a good addition to OpenAI's tools. The ability to investigate a subject and summarize content across dozens of pages can and will be a huge time-saver. The new o3 model can be felt in the quality of the final output. The final report tends to be packed with information and is genuinely interesting to read.

What I found to be suboptimal was the process both in terms of searching for data and finalizing the report. For any kind of deep research, you would expect to search multiple branches of information before combining everything, and finally double-checking all the facts. For the moment, OpenAI's Deep Research only has a very basic approach to searching information. And once the report is produced, it fails to take a step back to decide what further research or verification might be necessary. Thankfully, those can and will be improved over time. Part of the problem is that the o3 model is already supposedly very expensive to run and those improvements would significantly increase token consumption. But as we have seen, costs continually go down so it's only a matter of time.

This creates, however, a problem in the short term: I found the output to still contain a significant amount of mistakes either because of a lack of understanding of the web pages or due to a lack of further research to fact-check existing assertions.

The current release is an interesting first implementation. I could almost see myself using it but mostly can't because of the mistakes in the output. These problems can be fixed, and I look forward to all the improvements that should soon come.

#ai #productivity #review

💬 Comments
Subscribe to future posts