ATM I feel like LLM writing tests can be a bit dangerous at times, there are cases where it's fine there are cases where it's not. I don't really think I could articulate a systemised basis for identifying either case, but I know it when I see it I guess.
Like the the other day, I gave it a bunch of use cases to write tests for, the use cases were correct the code was not, it saw one of the tests broken so it sought to rewrite the test. You risking suboptimal results when an agent is dictating its own success criteria.
At one point I did try and use seperate Claude instances to write tests, then I'd get the other instance to write the implementation unaware of the tests. But it's a bit to much setup.
I work with individuals who attempt to use LLMs to write tests. More than once, it's added nonsensical, useless test cases. Admittedly, humans do this, too, to a lesser extent.
Additionally, if their code has broken existing tests, it "fixes" them by not fixing the code under test, but changing the tests... (assert status == 200 becomes 500 and deleting code.)
Tests "pass." PR is opened. Reviewers wade through slop...
The most annoying thing is that even after cleaning up all the nonsense, the tests still contain all sort of fanfare and it’s essentially impossible to get the submitter to trim them because it’s death by a thousand cuts (and you better not say "do it as if you didn’t use AI" in the current climate..)
Yep. We've had to throw PRs away and ask them to start over with a smaller set of changes since it became impossible to manage. Reviews went on for weeks. The individual couldn't justify why things were done (and apparently their AI couldn't, either!)
If you’re assessing things entirely on a strategic basis makes total sense. It’s understandable why they are doing it but I wouldn’t go so far as to say it’s underrated or suggest there are no drawbacks.
Duplicating things without reason is wasteful. With a hobby project sure that’s your own time and is likely more an act of consumption and personal fulfilment. But these are national economic resources being redirected away from other things.
In software in a large codebase where there are coordination costs with reuse due to the organisation structure, there’s a strategic reason not to reuse, but it might highlight a limitation of the organisation structure, but that’s not something someone making the call to reuse code or not can do much about.
Likewise France really can’t do much about the state of the US and dependency is understandably seen as a risk.
I mean there are pros and cons to many things. What I mean by underrated is just that a lot of people say "oh duplication, how wasteful" and don't realize the benefits that may exist in redundancy and diffusion. I think the US would benefit right now if there was more "duplication" in the sense of greater diversity across many industries. More car makers, more film studios, more news organizations, more social media companies, more record labels. Not more stuff --- not more cars, more films, more news, more social media, more records --- but just the same stuff spread over a greater number of entities. The consolidation we've seen over the past several decades is a bad thing.
You know you CAN actually quantify how bad or good these things are in what respect and their second order effects. The trades off are pretty well understood. Increasing returns to scale are a thing, as are natural monopolies where consolidation is more efficient even with the headaches that comes with regulating a monopoly.
Car makers, entertainment companies, news organisations are very different kinds of industries to the ones we’re talking about here. They aren’t natural monopolies and don’t feature increasing to scale (at all output levels). In media, the reasons we’re seeing consolidation is due to entry barriers primarily with how IPs protections work. This is entirely unrelated.
Also you’re talking about this entirely from a consumers point of view. From economy wide point of view, duplication of a product will pull resources away from other industries that might be more profitable for a country. Which is bad for the same reason tariffs are bad. These are real costs that will affect quality of life and crowd out desirable economic activity.
Just circling back to this original article. This is arguably not one of those cases.
But redundancy and duplication purely on principle is dogmatic and shortsighted, and yes wasteful. We don’t have infinite resources in the world.
Unless you’re referring to academic paper, I’m not getting a pay wall.
I read the article (but not the paper), but it doesn’t sound like a no. But I also don’t find the claim that surprising given in other languages word matters a lot less.
In languages where word order matters a lot less, the grammar is still there---it just relies more on things like case markers and agreement markers (i.e. morphology).
The paper is basically saying “we have evidence that supports language comprehension inconsistent independently of structural hierarchy” [1] (or at least that’s my read of it).
However I imagine linguists have a more precise definition than most of us, but instead of speculating, I’ve decide read the paper.
Something they explain early on is a concept called multi-words (an example incomplete this is an idiom) tend communicate meaning without any meaning grammatical structure, and they say this
> “… multiword chunks challenge the traditional separation between lexicon and grammar associated with generativist accounts … However, some types of multiword chunks may likewise challenge the constructionist account.”
I’m an amateur language nerd with a piecemeal understanding of linguistics, but I’m no linguist so I don’t know what half this means, but it really sounds like they have a very specific definition here, that neither of us are talking about, and possibly hasn’t been well communicated in the article.
That said I’m out of my depth here, and I have a feeling most ppl replying to this article are probably too if they are going off the title and article that linked to the paper. But I would be interested to hear the opinion of a linguist or someone more familiar with this field, and their experimentation methods.
—-—-—-—-—-
[1] With the hypothesis testing typically done in science you can’t really accept a alternative hypothesis only reject a null one given you’re evidence, so you get titles like “may” or “might” or “evidence supporting x, y, z”, so you get these noncommittal titles like the one. In social sciences or nonnatural sciences I feel this is even more the case given the difficulty of forming definitive experiments without crossing some ethical boundary. In nature science you can put to elements together control variables see what happens in social sciences it’s really hard.
>multiword chunks challenge the traditional separation between lexicon and grammar associated with generativist accounts
This is just silly (the paper, not your comment). Do these folks really think they're the first people to think of associating meanings with multi-word units? Every conceivable idea about what the primes of linguistic meaning might be has been explored in the existing literature. You might be able to find new evidence supporting one of these ideas over another, but you are not going to come up with a fundamentally new idea in that domain.
As another commentor has pointed out, many of the sequences of words they identify correspond rather obviously to chunks of structure with gaps in certain argument positions. No-one would be surprised to find that 'trees with gaps to be filled in' are the sort of thing that might be involved in online processing of language.
On top of that, the authors seem to think that any evidence for the importance of linear sequencing is somehow evidence against the existence of hierarchical structure. But rather obviously, sentences have both a linear sequence of words and a hierarchical structure. No-one has ever suggested that only the latter is relevant to how a sentence is processed. Any linguist could give you examples of grammatical processes governed primarily by linear sequence rather than structure (e.g. various forms of contraction and cliticization).
I think their point was the meaning of multi-words isn't the result of structure or word order, such as many idioms for example aren't interpreted literally or their grammar isn't too important.
But this is also academia they want to have evidence behind claims even if they feel intuitive. Like in the social sciences you'll have models and theories that are largely true in a lot of cases, but fail to explain variance from the models. The constructivist and whatever stuff sounds like one of those larger models and they are pointing out where it falls short, not to entirely invalidate it but to show the model has limitations.
I have a feeling the authors are well aware they aren't the first people to consider this, but they did leg work to provide some empirical evidence about the claim. Which is something you want to have in challenging the orthodoxy of a field. Entirely possible they're working on a larger piece of work but they're being asked to demonstrate this fact which this larger piece of work rests on. But I'm largely speculating there.
> On top of that, the authors seem to think that any evidence for the importance of linear sequencing is somehow evidence against the existence of hierarchical structure
The way I see it if you can demonstrate comprehension in the absence of this structure, I think you can make the case that it is optional and therefore may not rely on it. Which is a different claim from it benefits in no way whatsoever, which I don't think their evidence necessarily challenges (based on my read)
My view is when a language depends a lot on complex grammar what's happening is its trying resolve ambiguity, but languages can address this problem a number of ways. In languages like Russian they handle more of this ambiguity in inflection (and many non-English indo-european languages), in tonal languages to some extent tone creates a greater possible combination of sounds which could provide other ways of resolving ambiguity. That's my guess at least, I also accept I have no idea what I'm talking about here.
> if you can demonstrate comprehension in the absence of this structure,
> I think you can make the case that it is optional and therefore may not
> rely on it.
One kind of example demonstrating the importance of structure is wh-movement (the appearance of a word like 'who' or 'what' at the beginning of a sentence, when the argument it is asking about would be somewhere deeper inside the structure). For instance "Who did John say that Mary had a fight with __?" (I've represented the position of the argument with the __.) It's been known since the 60s that there are lots of constraints on wh-movement, e.g. *"Who did John say he knew the person who had a fight with __?" (vs. the non-wh-movement sentence "John said he knew the person who had a fight with Bill.")
>the meaning of multi-words isn't the result of structure or word order
Surely the 'word order' part must be a mistake here? Clearly word order influences the interpretation of sequences of English words. As for structure, the paper presents no evidence whatever that structure is not involved in the interpretation.
>many idioms for example aren't interpreted literally
This is just the definition of what an idiom is, not any kind of insight.
yeah the messaging is somewhat insecure in that it preemptively seeks to invalidate criticism by just being an experiment while simultaneously making fairly inflammatory remarks about nay sayers like they'll eat dirt if they don't get on board.
I think it's possible to convey that you believe strongly in your idea and it could be the future (or "is the future" if you're so sure of self) while it still being experimental. I think he would get less critics if he wasn't so hyperbolic in his pitch and had fewer inflammatory personal remarks about people who he hasn't managed to bring on side.
People I know who communicate like that generally struggle to contribute constructively to nuanced discussions, and tend to seek out confrontation for the sake of it.
I think I’d be interested in seeing something written about architectural decisions and how the architecture and your experience writing it differed from other non GPU projects (charting ideally but non-charting is fine too), and any unique hurdles you encountered in building this project?
Ah no... this is people's money, and they likely came to conclusion the US bonds are inconsistent with the funds goals and risk appetite. Within the first few paragraphs of the article you see this:
> The decision is rooted in the poor U.S. government finances, which make us think that we need to make an effort to find an alternative way of conducting our liquidity and risk management
If you're dealing with peoples pensions, even if there are higher growth portion of the funds allocation, you've got to make sure there are portions of the fund that's stable enough to be regularly liquidated to send out regular payments.
Given the whole hoo-ha with trump trying appoint their own guy into the federal reserve, it isn't that surprising the fund managers have decided to decrease their allocation.
In case you’ve forgotten this is a thread on an article about web components.
If the underlying premise of your point was entirely independent of web components you’ve done pretty poor job of communicating it.
So you actually do that? Use custom elements without web components instead of using classes? Are you using in something like react with custom elements foregoing type safety to avoid a div element? Or is this just in plain HTML? How many custom elements does your typical web project have?
Or are you fixating on an irrelevant technicality to make an irrelevant point?
The article is literally about not using web components. FTA:
> What would happen if we took the ideas of HTML Web Components and skipped all the JavaScript? [...] Okay great, we styled some HTML nested inside a custom element.
Let’s not pretend like your original point here was they were barely using features of web components correctly, like the parent of this thread was clearly implying.
I disagree on the number of elements actually approaching problematic territory, but agree this just isn’t something you can’t do already without web components
More professional organisations definitely have some kind of CMS, with potentially their own version management (at least for what’s published). But I also don’t think we can fault people for preparing their piece in their preferred writing tool.
I just can’t see existing news agencies doing this of their own volition. As Generating stories themselves is what keeps news agencies in business.
Unless they had a new competitor who had who kept running rings around them with all three features. But it’s going to come back to having better stories or better long form pieces (depending on the publications niche), as that’s ultimately why someone visits their site.
I could however see some 3rd party doing this like an extension that overlays someone’s site or acts as alternative presentation of their content.
I think a lot people underestimate how arbitrary some editorial decisions on wikipedia can be. Yeah perfect is the enemy of the good but imperfect is still imperfect. Can’t say I’m a fan of jj mccullough‘s opinions on some stuff but his video on wikipedia is good https://youtu.be/-vmSFO1Zfo8?si=0mS24EVODwLrPJ3T
I don’t feel as strongly as he does but ever since watching I just don’t see much value in starting with Wikipedia when researching something. He also points out how a lot content creators default to referencing it. After realising how much of history or geography YouTube is just regurgitating Wikipedia articles, it kind of ruined those kinds of videos for me, and this was before AI. So now I try spend more time reading books or listening to audiobooks on a topics I’m interested instead.
Like I still use Wikipedia for unserious stuff or checking if a book I was recommended was widely criticised or something but that’s it really.
It’s also just not a good learning resource, like if you ever wanted to study a mathematics topic, wikipedia might be one of the worst resources. Like Wikipedia doesn’t profess to be a learning resource and more a overview resource but even the examples they use sometimes are just kind of unhelpful. Here’s an example on the Fourier Transform https://youtu.be/33y9FMIvcWY?si=ys8BwDu_4qa01jso
> I think a lot people underestimate how arbitrary some editorial decisions on wikipedia can be.
I think it is true for all information we consume. One of the very important skills to learn in life is to think critically. Who wrote this? When? What would be their bias?
Text is written by humans (or now sometimes LLMs), and humans are imperfect (and LLMs are worth what they are worth).
Many times Wikipedia is more than enough, sometimes it is not. Nothing is perfect, and it is very important to understand it.
Also, I think for 99% of Wikipedia, there isn't much need to worry about Biases. It's about an uncontroversial chemical compound, a tiny village, a family of bacteria and so on. Knowledge isn't all subjective and prone to bias.
My point was mostly, people just aren’t as aware of issues with it compared to other forms of media. Issues in other forms of media don’t change that or make it less of an issue.
At the end of the day you’re gonna to consume information from somewhere, it’ll have shortcomings but you’re still better off knowing that going in.
On bias’ of authors: I actually think people fixate a bit too much on bias of an author to the point it’s a solely used as a speculative reason to dismiss something asuntrue. If the claims made by the author are consistent with other information and others trusted sources it’s just irrelevant. I feel people online to readily get hung up motivations and it’s sometimes a crotch for a readers inability to engage with ideas they find uncomfortable.
Like if a private company sponsors a study with a finding that aligns with their business interests, that actually doesn’t mean it’s false. It’s false if no one can reproduce their results. I mean you’d definitely want to verify other sources knowing this, but also researches have their own reputation to preserve as well. In reality the truth ends up being more boring than people anticipate.
But obviously it matters when claims can’t be verified or tested but I find online there’s an overemphasis of this online.
Critical thinking does not mean that you dismiss the information. It just means that you take the potential bias into account.
The media are often pretty bad at doing this: they will often make some kind of average on what is being said, like "the scientific consensus says that cigarettes are killing you, but a study sponsored by Philip Morris says that they are not, so... well we don't know". Where actually it should be pretty obvious that Philip Morris is extremely biased on that, and the scientific consensus is not.
Not every voice is worth the same. During covid, there was a tendency to relay all kinds of opinions, without making the difference between actual experts and non-experts. Sometimes even saying "this person is a doctor, so they know", which is wrong: being a doctor doesn't make you an expert on coronaviruses or epidemiology.
Whenever we get information, we should think about how much trust we can put into it, how biased the authors maybe (consciously or not), etc. Elon Musk saying that going to Mars can help humanity is not worth much. Because he is rich and successful does not make him right. Yet many people relay "Musk predicts that [...]", as some kind of truth.
I guess I had public discourse in mind when I was saying people to readily invoke claims of bias. Also alternative media which tends to be on the other extreme of being overly cynical.
If PM appeared on the news obviously no one would believe them.
That said in Australia we in the last few years we’ve increased the cigarette tax, smoking hasn’t really decreased, but treasury has reported decreased revenue. It clearly looks like the tax has been increased too high if sellers are illegally selling untaxed cigarettes.
It would be very dumb of a cigarette company like PM to come out and point this out (as it would just be a springboard for proponents of the tax to play attack others pointing it out the issue atm), but if they did, it wouldn’t mean it’s not happening. Even if they have a bias it would be irrelevant.
Speculation around bias is just treated too much of smoking gun, and claims of it are more often motivated reasoning not critical thinking.
Small typo though: I believe you meant "crutch" not "crotch" in:
> feel people online to readily get hung up motivations and it’s sometimes a crotch for a readers inability to engage with ideas they find uncomfortable.
> people just aren’t as aware of issues with it compared to other forms of media.
Really? I'd think it would be the opposite. Wikipedia has always been decried by academics (and primary school teachers) as "not a real encyclopedia", without giving anywhere near as much of a critical eye toward other sources of information.
Sure, I think Wikipedia's reputation and public image has gotten better over the years, but that stigma of it being created and written by "unprofessional anonymous people" is still there to some extent.
And regardless, the kind of person who is going to watch Fox News or CNN without applying any critical thought to what they hear there... well, probably is going to do the same for Wikipedia pages, or any other source of information.
I think academics are too critical for a source of general surface level knowledge. But it’s no substitute for primary sources
I don’t think the problem is anyone can jump on and edit Wikipedia, they have process, but it’s the processes, informal institution’s, where the issues I’m referring arise. The average person hears there a process and assume this means it’s legitimate and flawless and are over confident in its quality.
It’s a great resource but I tin it’s helpful to be realistic about its limitations.
The problem with Wikipedia as an Academic source is that it's impossible to cite. You have no idea whether the information on there today is going to be there tomorrow or was there yesterday.
> I think a lot people underestimate how arbitrary some editorial decisions on wikipedia can be.
You can say that about Encyclopedia Brittanica or any of the old-school encyclopedias too. You can say that about the news desks at ABC, CBS, CNN, etc. You can say that about the New York Times, Washington Post, Guardian, etc.
I don't think people tend to blindly trust Wikipedia any more than they do for other sources of information. YouTube is full of garbage Wikipedia-regurgitating articles because Wikipedia is an easy, centralized source to scrape, not because of any level of trust they put in it.
I find this type of snap negative reaction boring, tiring, and unhelpful. It's disappointing that they often end up as top comments here. (Human psychology at work.)
My take: I expect that Wikipedia is more unbiased and a better reflection of reality then most -- maybe even all -- other sources of information on the Internet. On average! There are certainly crap articles, just like anywhere else.
If you look at the news in a democratic country vs an authoritarian one, you may easily walk away with the impression that the former is in a state of perpetual chaos, because of all the scandals, protests, resignations and snap elections. The authoritarian country will look like a paragon of stability in contrast. New infrastructure projects, record economic growth, seditious officials swiftly trialed and imprisoned. There is barely any conflict and the ones that do exist get solved quickly.
But unless you are a total mark, you should know that the stability is just a facade. That infrastructure project only went through because locals who opposed got beaten up by the cops, the economics data was cooked up by statisticians who fear the consequence of telling the truth and the seditious officials are only at the receiving end of justice because they lost the power struggle within the party. But of course you don’t know any of that, because why would the state let you?
Wikipedia, like democracies, run on transparency. This is why you get to read the editing history and talk page of any Wikipedia page and walk away with the impression that Wikipedia is uniquely full of drama. You never feel the same about the New York Times or the BBC because they run more like autocracies and keep everything inside. If we get a chance to read the internal emails of establishment media we will walk away with a very different impression.
I’ve said to a few other replies, but tbc I wasn’t promoting other mediums as alternatives.
> I don't think people tend to blindly trust Wikipedia any more than they do for other sources of information
I actually disagree or at least I think the extent to which people do is higher than it warrants. Especially to the degree people invoke it’s contents online
> I find this type of snap negative reaction boring, tiring, and unhelpful.
I didn’t ask for this to be the top comment, nor did I make you read it. I also don’t think it’s useless, just less useful than its proponents claim it is. And I think people do themselves a disservice in not looking beyond it when looking into a topic
I disagree, I know the opinion of WSJ, WP, FT or national like france24,DW, BBC, RT,AJ
Or at least know is always opinion Base, the facts are selected in a subjecive way.
Is way harder to know how opinionated Wikipedia is, and everything make them sound like their opinion is only base on facts but isn't.
It's funny how the more accurate a source gets the more it draws in people desiring accuracy.
Then this rather small cohort of high precision people express frustrations without providing the context of accuracy against the masses preferred methods (TikTok, cable news, broadcast, truth social)
So now the water is muddled and people and Ais are mistrained because an "absolute scale" is not used when discussing accuracy.
Idk if this how it came off but just tbc my point also wasn’t indirectly promoting traditional media.
I think a lot if ppl are rightly sceptical of traditional media, but I feel I see more people giving Wikipedia a pass or placing it on a higher pedestal as a resource than it should be at times.
Admittedly I think I would prefer Wikipedia to traditional media in most cases. Although that wasn’t really what I was getting at
Most people don't even have the reading level for full comprehension of a wiki article, let alone being able to discern the nuance of some aspects of the topic.
> Yeah perfect is the enemy of the good but imperfect is still imperfect.
This assumes perfection is attainable. I'd like to see your idea of a "perfect" book or article on some topic.
I think you’re assuming I’m calling for perfection and suggesting people abstain from something less than perfect, we live in reality most things aren’t perfect.
Even if Wikipedia was the least worse resource people would still do themselves a disservice in ignoring issues with it. Acknowledging issues isn’t the same as dismissing it entirely.
How is this elitist? These other resources are more accessible than ever, no gate keepers are keeping anyone from looking at them. I’m also not making any judgments about anyone who uses Wikipedia either.
It's not supposed to be a learning source, it's not supposed to be an exhaustive reference on topics, it's not supposed to replace books and it has to be editorialized a lot to match the format.
It's by far the best encyclopedia ever created by mankind, on all metrics, but it's fundamentally an encyclopedia and nothing else.
And similarly about History and Geography YouTube, the problem isn't that they are regurgitated from Wikipedia, it's just YouTube is an entertainment platform.
JJ McCullough‘s gripe seems to be that wikipedia is kind of a mediocre summary of the information on a topic. But I'm not sure you should expect much more. You can always go to individual sources if you want that.
I've seen proudly uneducated people with no understanding take sledgehammers to history and real knowledge, and so I have no illusions about how Wikipedia is horrible, unfair, unprofessional, mercurial, and vulnerable to manipulation.
I would've gladly paid more in taxes to make Encyclopædia Britannica an international non-profit public service delivered in web form to all so long as each area were managed and curated with subject matter expert input.
I think if anything if Wikipedia contributes to encouraging someone take an interest in these things like this, despite not in school, that’s definitely a positive.
But if these are moderators on Wikipedia that sounds pretty bad, but even that would surprise me.
It’s tricky cause I think being dependent on funding would also expose it to censorship risk. Like if it was funded by the UN then it’s remit might be limited by what some more influential nations would allow. At the same time you might end up with higher quality resources in the less controversial topics of what’s allowed? But also alternatively bias ones on politically controversial topics.
If it was funded by individual nations it’s same problem possibly worse, if it’s funded by “benevolent” individuals there’s a risk it becomes tool to propagate their personal ideology. Even if it doesn’t, it would probably affect the reputation of the publication.
So in this respect I think this is one case where Wikipedia has some obvious appeal. I think they may need to elevate their standards in their moderation. It feels a bit elitist but maybe some kind of signal for qualifications in a field, or at least specific fields (like natural sciences).
That said political interference aside, some fields are very subjective or operate on lines of thinking that’s can’t be challenged through falsification, are last in the social sciences. Like history can be very political.
Maybe a nonfree encyclopaedia isn’t so bad if it’s free from these issues, but can it even be sustained by a market? idk
Reading (the right) books is definitely the best way to learn about a topic, but its not great for quickly looking up random stuff. Books can spread misinformation too, from Malleus Maleficarum to Erich von Däniken.
It is useful for quickly looking up simple facts, and provides a list of sources.
The video makes some interesting criticisms. The lack of diversity is not surprising. Dominated by white, male, American's with time on their hands! how would have thought that? Its very obviously American dominated (at least the English version).
I once partly cross verify a virologist's lecture. He confused a brother of an important scientist who made an important discovery. I have no doubt that he knows what he's talking about when it comes to viruses.
All in all, checking other sources to see if they lines up is a pain and labor intensive, never mind actually checking to see if the references are actually sound evidence.
holy heck there is so much wrong about this video. i can't believe "internet influencers" can just turn on their cameras and spew so much untruth without a care in the world...
comparatively wikipedia is imperfect, but much better than this kind of slop.
Feel free to actually articulate the actual issues you’re referring to.
It’s been a while since I watched it but the thing I remember taking away was you can do a lot better than Wikipedia, and he encouraged people to spend more time looking at primary sources for deeper research, and points out how it’s the basis of a lot of slop on YouTube.
Like the the other day, I gave it a bunch of use cases to write tests for, the use cases were correct the code was not, it saw one of the tests broken so it sought to rewrite the test. You risking suboptimal results when an agent is dictating its own success criteria.
At one point I did try and use seperate Claude instances to write tests, then I'd get the other instance to write the implementation unaware of the tests. But it's a bit to much setup.
reply