• Ranvier@lemmy.world
    link
    fedilink
    arrow-up
    71
    ·
    edit-2
    7 months ago

    It’s just a multiple choice test with question prompts. This is the exact sort of thing an LLM should be very good at. This isn’t chat gpt trying to do the job of an actual doctor, it would be quite abysmal at that. And even this multiple choice test had to be stacked in favor of chat gpt.

    Because GPT models cannot interpret images, questions including imaging analysis, such as those related to ultrasound, electrocardiography, x-ray, magnetic resonance, computed tomography, and positron emission tomography/computed tomography imaging, were excluded.

    Don’t get me wrong though, I think there’s some interesting ways AI can provide some useful assistive tools in medicine, especially tasks involving integrating large amounts of data. I think the authors use some misleading language though, saying things like AI “are performing at the standard we require from physicians,” which would only be true if the job of a physician was filling out multiple choice tests.

    • Rolder@reddthat.com
      link
      fedilink
      arrow-up
      9
      arrow-down
      1
      ·
      7 months ago

      I’d be fine with LLMs being a supplementary aid for medical professionals, but not with them doing the whole thing.

  • Etterra@lemmy.world
    link
    fedilink
    arrow-up
    28
    ·
    7 months ago

    I wonder why nobody seems capable of making a LLM that knows how to do research and cite real sources.

    • NosferatuZodd@lemmy.world
      link
      fedilink
      arrow-up
      16
      ·
      7 months ago

      I mean LLMs pretty much just try to guess what to say in a way that matches their training data, and research is usually trying to test or measure stuff in reality and see the data and try to find conclusions based on that so it doesn’t seem feasible for LLMs to do research

      They maybe used as part of research but it can’t do the whole research as a crucial part of most research would be the actual data and you’d need a LOT more than just LLMs to get that

      • BigMikeInAustin@lemmy.world
        link
        fedilink
        English
        arrow-up
        12
        ·
        7 months ago

        Yup! LLMs don’t put facts together. They just look for patterns, without any concept of what they are looking at.

    • FaceDeer@fedia.io
      link
      fedilink
      arrow-up
      9
      arrow-down
      1
      ·
      7 months ago

      Have you ever tried Bing Chat? It does that. LLMs that do websearches and make use of the results are pretty common now.

      • Bitrot@lemmy.sdf.org
        link
        fedilink
        English
        arrow-up
        7
        ·
        edit-2
        7 months ago

        Bing uses ChatGPT.

        Despite using search results, it also hallucinates, like when it told me last week that IKEA had built a model of aircraft during World War 2 (uncited).

        I was trying to remember the name of a well known consumer goods company that had made an aircraft and also had an aerospace division. The answer is Ball, the jar and soda can company.

        • NateSwift@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          7 months ago

          I had it tell me a certain product had a feature it didn’t and then cite a website that was hosting a copy of the user manual… that didn’t mention said feature. Having it cite sources makes it way easier to double check if it’s spewing bullshit though

        • FaceDeer@fedia.io
          link
          fedilink
          arrow-up
          1
          arrow-down
          1
          ·
          7 months ago

          Yes, but it shows how an LLM can combine its own AI with information taken from web searches.

          The question I’m responding to was:

          I wonder why nobody seems capable of making a LLM that knows how to do research and cite real sources.

          And Bing Chat is one example of exactly that. It’s not perfect, but I wasn’t claiming it was. Only that it was an example of what the commenter was asking about.

          As you pointed out, when it makes mistakes you can check them by following the citations it has provided.

    • kbin_space_program@kbin.run
      link
      fedilink
      arrow-up
      8
      arrow-down
      1
      ·
      edit-2
      7 months ago

      Because the inherent design of modern AIs is not deterministic.

      Adding a progressively bigger model cannot fix that. We need an entirely new approach to AI to do that.

      • Immersive_Matthew@sh.itjust.works
        link
        fedilink
        arrow-up
        2
        arrow-down
        1
        ·
        7 months ago

        Bigger models do start to show more emergent intelligent properties and there are components being added to the LLM to make them more logical and robust. At least this is what OpenAI and others are saying about even bigger datasets.

      • Onno (VK6FLAB)
        link
        fedilink
        arrow-up
        2
        arrow-down
        2
        ·
        7 months ago

        For me the biggest indicator that we’ve barking up the wrong tree is energy consumption.

        Consider the energy required to feed a human with that required to train and run the current “leading edge” systems.

        From a software development perspective, I think machine learning is a very useful way to model unknown variables, but that’s not the same as “intelligence”.

    • BetaDoggo_@lemmy.world
      link
      fedilink
      arrow-up
      3
      ·
      7 months ago

      Cohere’s command-r models are trained for exactly this type of task. The real struggle is finding a way to feed relevant sources into the model. There are plenty of projects that have attempted it but few can do more than pulling the first few search results.

  • Onno (VK6FLAB)
    link
    fedilink
    arrow-up
    27
    ·
    7 months ago

    What would be much more useful is to provide a model with actual patient files and see what kills more people, doctors or models.

  • theluddite@lemmy.ml
    link
    fedilink
    English
    arrow-up
    19
    ·
    edit-2
    7 months ago

    All these always do the same thing.

    Researchers reduced [the task] to producing a plausible corpus of text, and then published the not-so-shocking results that the thing that is good at generating plausible text did a good job generating plausible text.

    From the OP , buried deep in the methodology :

    Because GPT models cannot interpret images, questions including imaging analysis, such as those related to ultrasound, electrocardiography, x-ray, magnetic resonance, computed tomography, and positron emission tomography/computed tomography imaging, were excluded.

    Yet here’s their conclusion :

    The advancement from GPT-3.5 to GPT-4 marks a critical milestone in which LLMs achieved physician-level performance. These findings underscore the potential maturity of LLM technology, urging the medical community to explore its widespread applications.

    It’s literally always the same. They reduce a task such that chatgpt can do it then report that it can do to in the headline, with the caveats buried way later in the text.

  • Poe@lemmy.world
    link
    fedilink
    arrow-up
    10
    ·
    7 months ago

    Neat but I don’t think LLMs are the way to go for these sort of things

    • BolexForSoup@kbin.social
      link
      fedilink
      arrow-up
      4
      ·
      7 months ago

      I don’t mind so long as all results are vetted by someone qualified. Zero tolerance for unfiltered AI in this kind of context.

      • Skua@kbin.social
        link
        fedilink
        arrow-up
        3
        ·
        7 months ago

        If you need someone qualified to examine the case anyway, what’s the point of the AI?

          • Skua@kbin.social
            link
            fedilink
            arrow-up
            1
            ·
            edit-2
            7 months ago

            In the test here, it literally only handled text. Doctors can do that. And if you need a doctor to check its work in every case, it has saved zero hours of work for doctors.

              • Skua@kbin.social
                link
                fedilink
                arrow-up
                1
                ·
                7 months ago

                how high processing power computers with AI/LLM’s can assist in a lab and/or hospital environment

                This is an enormously broader scope than the situation I actually responded to, which was LLMs making diagnoses and then getting their work checked by a doctor

          • Skua@kbin.social
            link
            fedilink
            arrow-up
            1
            ·
            7 months ago

            In the example you provided, you’re doing it by hand afterwards anyway. How is a doctor going to vet the work of the AI without examining the case in as much detail as they would have without the AI?

            • BolexForSoup@kbin.social
              link
              fedilink
              arrow-up
              1
              ·
              edit-2
              7 months ago

              Input symptoms and patient info -> spits out odds they have x, y, or z -> doctor looks at that as a supplement to their own work or to look for more unlikely possibilities they haven’t thought of because they’re a bit unusual. Doctors aren’t gods, they can’t recall everything perfectly. It’s as useful as any toxicology report or other information they get.

              I am not doing my edits by hand. I am not using a blade tool and spooling film. I am not processing it. My computer does everything for me, I simply tell it what to do and it spits out the desired result (usually lol). Without my eyes and knowledge the inputs aren’t good and the outputs aren’t vetted. With a person, both are satisfied. This is how all computer usage basically works, and AI tools are no different. Input->output, quality depends on the computer/software and who is handling it.

              TL;DR: Garbage in, garbage out.

          • Skua@kbin.social
            link
            fedilink
            arrow-up
            1
            ·
            edit-2
            7 months ago

            Usually to do work that needs done but does not need the direct attention of the more skilled person. The assistant can do that work by themselves most of the time. In the example above, the assistant is doing all of the most challenging work and then the doctor is checking all of its work

  • loathsome dongeater@lemmygrad.ml
    link
    fedilink
    English
    arrow-up
    4
    ·
    7 months ago

    This research has been done a lot of a times but I don’t see the point of it. Exams are something I would expect LLMs, especially the higher end ones, to do well because of their nature. But it says next to nothing about how reliable the LLM as an actual doctor.

    • gregorum@lemm.ee
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      7 months ago

      Even those who do well in testing of wrote knowledge can perform poorly in practical exercises. That’s why medical doctors have to train and qualify through several years of supervised residency before being allowed to practice even basic medicine.

      GPT-4 can’t do even that.

    • beardown@lemm.ee
      link
      fedilink
      arrow-up
      1
      arrow-down
      1
      ·
      7 months ago

      But it says next to nothing about how reliable the LLM as an actual doctor.

      Yet these tests say anything about how a human would be as an actual doctor?

      • loathsome dongeater@lemmygrad.ml
        link
        fedilink
        English
        arrow-up
        1
        ·
        7 months ago

        It says as much as it does for an LLM but doctors have to have a lot of field experience after passing these tests before they get certified as doctors.

        • beardown@lemm.ee
          link
          fedilink
          arrow-up
          1
          arrow-down
          1
          ·
          7 months ago

          Then we should remove such tests and, if anything, increase such field experience

            • beardown@lemm.ee
              link
              fedilink
              arrow-up
              1
              arrow-down
              1
              ·
              7 months ago

              Because clearly passing such tests doesn’t matter. If it did matter then it would be noteworthy and have implications for the labor value of doctors that gpt could pass the tests to a better extent than many of them

              • my_hat_stinks@programming.dev
                link
                fedilink
                arrow-up
                1
                ·
                7 months ago

                If you can’t read a licence plate at 20 metres you can’t safely drive. Being able to read a licence plate at 20 metres does not make you a safe driver. The test still matters.

              • loathsome dongeater@lemmygrad.ml
                link
                fedilink
                English
                arrow-up
                1
                ·
                7 months ago

                Tests are meant to gatekeep who gets to get the field training required to become a doctor. Sending every jabroni into residency willy-nilly is probably gonna collapse the healthcare system completely.

                • beardown@lemm.ee
                  link
                  fedilink
                  arrow-up
                  1
                  ·
                  7 months ago

                  That wouldn’t collapse the health care system, it would devalue the salaries of doctors which would be good for everyone else as it would lower costs. Which is what has happened to practically every other profession

  • roguetrick@kbin.social
    link
    fedilink
    arrow-up
    2
    ·
    7 months ago

    The 17th percentile in peds is not surprising. The model mixing it’s training data with adults would absolutely kill someone.

  • Aussiemandeus@aussie.zone
    link
    fedilink
    arrow-up
    5
    arrow-down
    6
    ·
    7 months ago

    Google started killing the Dr industry (gp) Ai will finally be the nail in the coffin except Drs will never give up the power to prescribe

    • BigMikeInAustin@lemmy.world
      link
      fedilink
      English
      arrow-up
      10
      arrow-down
      1
      ·
      7 months ago

      LLMs can’t design experiments or think of consequences or quality of life.

      They also don’t “learn” from asking questions or from a 1-time input. They need to see hundreds or thousands of people die from something to recognize the pattern of something new.

      • Aussiemandeus@aussie.zone
        link
        fedilink
        arrow-up
        1
        arrow-down
        4
        ·
        7 months ago

        Yeah but they can give the common answers of bed rest and hydration that is a drs go too for every thing.

        I imagine a future where LLM take over the menial up duties of you have a cold, you have high blood pressure etc.

        So actual Drs spend more time doing less menial tasks.

        But since as a society we develop automation and fire everyone around it i cant see it really happening

        • BigMikeInAustin@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          7 months ago

          In a society that valued preventative healthcare, people would get deep scans regularly when healthy, and an AI would take up the menial work of sifting through the large amount of extra data to detect issues early. Theoretically an AI would give the same amount of attention to the first scan of the day as the last scan of a 12 hour day.