The moon landing? A spectacle. The internet? Not the cool hideout we think.

Wikipedia is the most important and sincere project ever undertaken. 
Wikipedia is in some trouble
analysis here
I got involved in wikipedia very early. It was one of the most revealing things in my life, watching it being laughed at, to become the center of, and authority of, human knowledge. It was obvious that it was working. The lag of this was 5 years or more. Wikipedia was an experiment that proved itself, when every graph was going up and to the right. Some of the graphs are now going down.
Wikipedia's Markup language:
One of the central design decisions in wikipedia is that all information is stored in an editable document. This poses a huge amount of challenges for caching and scaling wikipedia. It's not a database, that you can run a script on. Worse though, is that all of it's content is buried in this ad-hoc, impenetrable, opaque, and mostly unparsable format. If wikipedia had used markdown, html, or some standardised format, any parser would flip-it into other future formats. Wikipedia's custom language is just clearly insane, undocumented, hopeless. There's a team (of great people!) at wikimedia constantly working on it, and unable to make any backwards-incompatible changes. I imagine their lives are hard. People are creating weird new syntax concepts all the time. Here's the markup for the first sentence of the Albert Einstein wikipedia article: The first wikimedia parser was called mwparserfromhell. DBPedia, the center of the semantic web, after years of work, has only ever offered limited parsing from categories and infoboxes. Much of the early-years at Freebase were spent trying, with limited-success, at parsing wikipedia. I've spent years trying to parse it myself. I'm a shitty programmer. WolframAlpha, and many other serious companies are using my parser, which is down right hilarious.
yes, arabic editors must write it in right-to-left, somehow.
It's hard to describe how much of a serious problem this is. Wikipedia's content is never going to go anywhere, or be used by anything. Wikipedia may slowly die-off - like myspace, or geocities - but it's information will not go on. Play-around in the official wikipedia android app. Many pages are unreadable. There is a good-deal of clearfix, and table-span logic, mushed right into the syntax. Most developers will not touch this kind of stuff. There will be no move to a wikipedia 2.
Static copies of dynamic content
The contents of the english wikipedia dump are as follows, (as of Jan 2019):
of the 14m records in the wikipedia dump, only 5.5m (40%) are public-facing articles. Yup. This does not include deleted pages, or old versions, either. Redirects: A computer-science 101 problem is to implement a fuzzy string matching. There's usually a section in the textbook about it:
there are 8,550,441 redirects in wikipedia. They are mostly typos, or case-changes, and are mostly created by hand, every day. and what happens to a redirect when a page gets deleted, or merged, or spli - Yup.
Talk pages: Wikipedia has 35m registered users. When a user joins, a bot will often send them a {{welcome}} template. Sometimes nice users will do it themselves. It looks like this. - when this happens, this creates a new user page, with a copy of this text each time. There are millions of examples of this in the dump. The same text, verbatim over and over. The same process happens with 'Wikiprojects'. Bots go around adding templates, by creating a talk page, and adding a template to it. The same process happens with deleted pages, fair-use warnings, and some bot edits. Each time an edit happens, a new page is created, and boilerplate text gets thrown into it. Resulting in this:
So let's get things straight: in 1993, a small japanese game company created this videogame:
• In 2008, a wikipedia user created the article with two sentences and a link. • In the past 10 years since, the page has been edited 26 times by bots. • this created a Talk page filled with 11 automated sentences. A huge bulk of the wikipedia database is this boilerplate text. See this astroid, this virus, or this judo club. ... and remember, if we wanted to change this text, we'd have to go and edit each of these pages - and because this syntax is so nuts, bots have a hard time making even simple stylistic changes, without ruining a whole page. oh, so what happens to these auto-generated talk pages when a page is deleted, merged, or spli- yup Automated articles: There is a lot of disagreement about how much of wikipedia is generated by bots, and if this matters. There's no way of knowing. Any boring-themed article with a few sentences, a reference, and an infobox probably won't get deleted. So nothing's stopping you from spitting-out articles from a database of enzymes, or college rugby players, or season statistics for defunct sports teams.
I have nothing against bots, I have nothing against the long-tail, but I think automated article-creation is responsible for a good amount of the wikipedia's claimed growth, over the past few years. We need to be up-front about this, if we're talking about the health of the project. Here's the distribution of words in english wikipedia, by the size of articles:
ok hey, I don't mind that the wm developers didn't develop a fancy search index, in 2001. That's fine. Nobody could have predicted the success and scale of wikipedia early on.
What angers me though - and it should anger you - is that these problems has not been fixed in the 18 years since. God damn them. Well-meaning people are wasting their time on this everyday. Any startup job-interview asks questions about implementing a system like this. Any CS grad can create a lucene index, to handle typos. Some of it is complicated. Some of it is basic competence. It's annoying to whine, but at some point, we're right to be angry at wikipedia. that it cannot find 2nd gear, when the rest of the world is zipping-along.
In 2007 Danny Hillis raised $57 million dollars,[1] bought-out the entire MIT semantic-web group, hired 50~ employees, (including this person, this person, this person) and got an office in the mission. They reconciled all of wikipedia, the entire musicbrainz database, the entire open-library database, the tvdb database, and all of wordnet. They signed a (massive) deal to import all collections of the stanford library.[2] They hit high-90% classification of all 50 million entities (wikipedia has 5m) They were evaluated at very-high 90's accuracy by several third-parties. Facebook, Bing, Amazon, and Google all began using its data in nearly real-time in their search products. This was one of the largest and most ambitious software projects in history. In 2010 freebase was bought for a whack of money, and then killed-off by google. When google announced they would be shutting down the API, they offered to import all of this data to a new wikimedia project called wikidata. Wikidata was 4-or-5 Lua developers, in Germany, on a few research grants.
and they said no. 😐
so they said this data didn't meet it's guidelines regarding sourced data. ... aren't you pulling information from wikipedia blindly? ... what about your (~60%) unreferenced facts? ... aren't you multiplying vandalism from multi-lingual wikipedias? ... how do you use, or verify references? They built a tool to hand-transfer each freebase fact, which if you have a calculator, may seem funny. (at 10 people clicking full-time, would have taken 10 million years) 8 years later, Wikidata remains tiny, buggy, unused, and worse - majority unreferenced. I mean, they're pulling their data from wikipedia, which gets vandalized almost every minute! It's accomplished very-little from its 5-year plan. They write academic papers. They still don't really offer a rest api. ...creating new types or properties is I think, possible? or it's supposed to be... It's got few of the safeguards, momentum, features, and ambition that Freebase had a full decade ago. If wikidata was a company, it would not exist anymore, and you wouldn't have heard of it. But Wikimedia places banner-ads on hours of eye-blistering user-created content, begging children, students, and poor-people for money. and they choose to be this petty, pithy and behind-the-times.
It's a beautiful idea, to classify information with category-scheme, until it falls-apart. Wikipedia has many-thousands of categories. They loop-around all-over the place.
People: → Musicians → Singers → American_Idol → Books_about_American_Idol
or worse:
Albanian language: → Albanian-speaking countries and territories → Kosovo (region) → Kosovo → Kosovar society → Languages of Kosovo → Albanian language
if you're ever too-cheerful, and wanting to feel depressed, have a visit Categories for deletion/Today, where you'll see precious human-life spent debating whether 'Category:Goth' should exist, if it is a genre of music, if it's is a fashion-style, etc. Work is being ravenously deleted all the time. You'll get sad thinking about it.
what about all this stuff →
You're right. Wikipedia has good structured-data in infoboxes, lists, tables, citations, etc. The issue is, as of Feb 2019, Wikipedia has 634,755 different kinds of templates (see this 21mb download). Yes, there are all different. Yes, there are templates-within-templates-with-escaping-with-escaping. Even if you parse them perfectly, how do you know that for Template:HorseDeathYear, the third paramater is the birth date of the horse, and the fourth is the birth-month? see, for example: • Tennis Brackets vs Table Tennis Brackets • 'Birth_date_and_age' vs 'Birth-date_and_age'. • a template for a Trapezoid unicode symbol It's just a straight-up mess. If you're thinking, gee wikipedia editors must feel exhausted and stupid - you're right. Many devoted editors and admins have walked-away from the project. None of the excited journalists that covered wikipedia's growth have covered the rage-quitting since. It's not an exciting headline.
Wikipedia is socially-broken
Everyone has their favourite. The wikipedia-debate that crushes me the most is the inclusionist/exclusionist debate. You'd think, after 16 years, and spending $141 Million (often taxpayer) dollars, wikipedia would have mostly-decided what it is and isn't supposed to be. This is not the case. Every day there are 1,000 pages deleted from wikipedia. A small percentage of these are vandalism. The bulk of this is wikipedia articles about places, people, bands, tv-shows, that some wikipedia editors choose to delete. Being a wikipedia editor is a thankless and brutal task, and mostly everybody does it out of good will. What happens though, and what I've observed, is that stressed-ou editors begin to take pleasure from deleting incomplete pages. It's just that they begin to see new pages as a liability. You can understand that eventually this is what wiki-burnout feels like. Consider Jimmy Wales, (the founder) who, in 2007, created a wikipedia page for the Mzoli’s Meat restaurant in Cape Town, and had it deleted. Consider Donna Strickland, who won a Nobel prize for her work in 2018, but had her wikipedia page deleted for Notability. My Member of Parliment had her wikipedia page deleted, for notability during the election. Check out this talk page for a crater on the moon. In 2018, TIFF organized a weekend edit-a-thon, where volunteers researched, and contributed articles about Canadian Cinema. Their edits were flagged as a corporate-IP, and deleted, based on the premise that they were biased. Nobody cares about this. Nothing is going to improve. If you think wikipedia is a progressive, cute community of well-meaning friends, I'm sorry. Wikipedia is in full social failure. Design: I'm not the person to spend time criticising UI decisions, ...but unsolicited wikipedia redesigns are every the bread-and-butter of every designer on dribble. I'll just say, here are two of my favourites:
by 1910 design studio
by Aurélien Salomon
if you think: 'Wikipedia couldn't just change, it's more complicated than that' well, you'd be wrong. Nothing is stopping it from changing. It's just unable to, from social failure.
Wikimedia is a moral failure
On January 18, 2012, wikimedia made the bold move of blacking-out its website to protest the American SOPA/PIPA bills. This was the first time, in it's history, it has used it's tremendous political and social clout for anything regarding a moral stance. (It seems a lot less brave when remember that this came after Reddit, Google, Mozilla, and Yahoo, EFF) Wikimedia is one of the top websites on Alexa, has a clear social mandate, enormous resources, and has done nothing with it at all. Compare it to Mozilla, with it's slim punk-rock attitude, which has relentlessly defended every aspect of the internet, against enormous odds. Consider the EFF, which has fought, and won, some of the most scary law-suits, ever. The turnover-rate of wikimedia upper-management is comical. I've spoken to wikimedia employees who had never heard of dbpedia. They're probably the largest tech company ever to not hire a designer. They run a an academic conference.
The story that really got to me, over the years, was the story of the Visual Editor. In 2012, under new management, wikimedia did a deep analysis of it's problems, and built a 5-year plan. Among it's priorities, was addresssing obstacles in onboarding new editors. They identified that learning the terse wikimedia syntax was an obstacle to becoming a new user, and started, in earnest, to create a WYSIWYG Visual Editor. They hired developers, speced-out the project in detail, and committed to montly public progress updates. After a year of development, when it was completed, wikipedia editors voted en-mass against integrating the visual editor. This organization couldn't stand-up and do one thing if all life depended on it. Take a moment in Times_that_100_Wikipedians_supported_something, if you are feeling good about wikipedia.
I felt awful writing this. I fell right-into the 'bitter blog' trope that I can hardly stand. But seriously, this project is in such poor shape, and this angers me, like nothing else does.
I love a good personal blog post,

one of the best is Daniel Huffman's post about returning to grad-school later in life, and flunking out.

Then he talks about making this map, for wikipedia, and how rewarding an experience it was to make, as a total amateur.
Wikipedia's in trouble. it's in real trouble, and few people realize this. If it stops being a thing that people do, it will be very sad.