Old New York Times Articles – This feature looks at some of the names or famous names in the Times. Have an idea or something you’d like to read about? Leave a suggestion in the comments section.
“A cross between a boy singer and a drummer”, the 20-year-old had a voice that was “anything but beautiful” when Bob Dylan entered the Times music critic’s mind. Author Robert Shelton helped launch Mr. Dylan in September. 29, 1961, profile:
Old New York Times Articles
Suddenly caught in the trap of the McCarthy era – he was embarrassed by reporter Willard Shelton and refused to testify when a Senate committee endorsed him – Mr. Shelton is interested in stopping music and protests. This led to another nomination for Dylan for “The Ballad of Emmett Till” in August’s third paragraph. 20, 1962, article on the civil rights movement:
The New York Times Published A One Word Article
Mr. Shelton was also there when Mr. Dylan turned heads at the 1965 Newport Jazz Festival by swapping his guitar for an electric one:
He wrote the liner notes for the first album Mr. Dylan using the name “Stacey Williams”. He later wrote a biography, “No Home Direction: The Life and Music of Bob Dylan.” The New York Times recently celebrated its 20th anniversary on the web. Of course, today’s digital platform is different from what it was decades ago, so we need to improve the presentation of archival data.
In 2014, we launched an overhaul of our entire digital platform that gave readers a modern, seamless, and user-friendly experience with improvements like faster functionality, responsive layouts, and live translation. While our new design improved the reader experience for new stories, engineering and resource issues prevented us from migrating previously published stories to this new design.
Woodstock 1969: A Story Vastly Bigger Than Editors Realized
Today we are happy to announce that thanks to the efforts of the team, almost every article published since 2004 is available to our readers in a new and updated format.
As is often the case, the seemingly simple task of content migration has quickly turned into a huge and complex project involving many technical challenges. It turns out that translating some 14 million articles published between 1851 and 2006 into a format that matches our current SPS experience with readers is not easy.
At first, the problem seems simple: we have an XML database, we need to convert it into a JSON format that can be implemented by our CMS. Most of our archive data from 1851 to 1980 has enough data in the XML file and all we need to do is transform the XML and rewrite it into a new format.
On This Day: A Censure In The Senate
The history from 1981 to 2006 is complex. We analyzed articles converted to XML format as examples of articles currently processed on the Web and found that in 2004 alone, our site had more than 60,000 articles that were not archived in XML. There are probably hundreds of thousands of articles from 1981 that are only available online and are missing from the archives, showing only what was in print. This created a problem because the missing article would show up as a 404 spam page, which would ruin the user experience and hurt our search engine rankings.
In order to effectively migrate our archive, we need to create a “clean” list of all articles that appear on the site. To compile this list, we analyzed multiple data sources, including surveys, sitemaps, and databases of book, movie, and hotel reviews.
From our well-defined list of articles, it became clear that we would need to extract structured data from raw HTML for things not in our XML repository.
How The New York Times Makes Money
Archive migration pipeline. The red box is an entry – an XML archive in the list of URLs we specified; the blue box is the mean record; and the green box is the final result – the JSON has been processed successfully and we need to skip the message due to errors.
Our CMS stores a lot of metadata about articles, such as publication date, category, topic, order, date stamp, summary, etc. We need a way to extract this metadata from the raw HTML and XML in addition to the article content itself. We used Python’s built-in xml ElementTree parser for XML processing and BeautifulSoup for HTML processing.
As part of the journey process, we create new, SEO-friendly URLs for older content so readers can easily find our historical data. SEO-friendly URLs usually include some keywords related to the content of the page, a practice that is not clear to us.
Language Log » Rebranding
For example, on February 12, 2004, the headline “San Francisco mayor legalizes gay marriage” appeared under a URL ending in “12CND-FRIS.html”. Realizing that we could provide a more descriptive link, we derived a new URL from the keyword. Now the URL ends, this article ends.
When we discovered the missing URLs in our database, we realized we had a new problem: duplicate content. Some “empty” URLs point to HTML documents with existing content in our XML repository. If we convert both XML and HTML to JSON without specifying the duplicate content, many articles will have more than one URL, causing the duplicate pages to compete with each other for relevant positions in the program. To look for.
Obviously, we need to find the corresponding XML content and HTML content. As another challenge, we’ll use a method that doesn’t depend on the direct string concatenation, since there may be some differences between the XML file and the HTML, as other text that will block the source, since both are exactly the same. To solve these problems, we used an algorithm developed for another project, TimesMachine, which is based on the text “shingling” method. Read more about the process here.
Old Article About Bishop Kelly
This method works very well for most “plain” HTML text and existing XML reports. For example, in 2004 we had 60,000 articles that were previously missing, but this step was more effective on 42,000 articles, reducing duplicates by 70%. The remaining 30% of the article will be deleted using BeautifulSoup.
Although our original goal was to modernize our digital archives, the migration project has created opportunities for future projects to engage our readers with a large range of historical data.
For example, we recently expanded TimesMachine, our custom PDF reader, to include newspaper reviews from 1981-2002. However, 1851-1980 News articles from 2016 are still only available in scanned form on TimesMachine. The full digital text will take this experience further.
New York Tribune (new York [n.y.]), October 18, 1921
We are currently working with the service industry to bridge the gap from the 1960s to the 1980s so readers can quickly find, explore and experience content throughout history. Good luck: We’ve released the entire digital text of an article written in the 1970s.
We will continue to update NYTimes.com with new and updated articles in the near future. Stay tuned! You can follow @NYTArchives on Twitter for more updates.
Earlier this year, we quietly expanded TimesMachine, a powerful microfilm reader, to include all issues of The New York Times published between 1981 and 2002. Prior to this expansion, TimesMachine included all issues published between 1851 and 1980, containing more than 11 million articles spread over more than 2.5 million pages. The new expansion adds a full 8,035 issues with 1.4 million stories on 1.6 pages.
My Grandfather’s New York Times The Day After The 1969 Moon Landings
Building and deploying TimesMachine presented us with many interesting technical challenges, and in this post we’ll explain how we tackled two. First, we will discuss the main challenge with TimesMachine: providing the user with thorough analysis of newspapers throughout the day without having to download hundreds of megabytes of data. Then we’ll discuss the problem that connects the fascinating thread that we need to solve in order to include articles published after 1980 in the TimesMachine.
Prior to the launch of TimesMachine in 2014, news from the archive was scanned and available to subscribers only as PDF documents. Although the storage can be accessed, there are still two important implementation issues: environment and user experience.
Separating a story from its surrounding content removes the context from which it is published. A modern reader may learn that on July 20, 1969, a man named John Fairfax became the first person to cross the Atlantic Ocean. However, one of the readers of the New York Times that morning may have been greatly impressed by the front-page news that Apollo 11, crewed by Neil Armstrong, had entered lunar orbit to prepare for the first moon landing. Knowing where John Fairfax’s article was published in the book (in the lower left corner of the first page) and what else happened that day is more interesting and useful for the historian than the article itself, without considering other articles. days.
New York Times Op Ed — Garland Jeffreys
We want to present this archive in all its glory as it was intended on the day of publication – one issue at a time. Our goal is to create a great viewing experience, not to force users to slow down to high resolutions
Feature articles new york times, new york times online articles, best new york times articles, new york times articles, new york times biology articles, new york times old articles, environment articles new york times, new york times opinion articles, new york times newspaper articles, free new york times articles, new york times articles archive, new york times psychology articles