How to split a long article and save it in the database for easy searching and swapping?

Question

How to split a long article and save it in the database for easy searching and swapping?

Suppose this is a long article (say 100,000 words) and I need to write a PHP file to display page 1, 2 or page 38,

display.php?page=38

but the number of words for each page can change over time (for example, right now, if it is 500 words per page, but next month we can easily change it to 300 words per page). What is a good way to split a long article and store it in a database?

PS . Design can be even more complex if we want to display 500 words, but include whole paragraphs. That is, if we show the word 480, but there are 100 more words left in the paragraph, then show these 100 words, even if it exceeds the limit of 500 words. (and then, the next page should not show these 100 words again).

+4

database php database-design schema

太極者無極而生 May 31, '09 at 20:27

source share

6 answers

Of course, you could output exactly 500 words per page, but the best way would be to take some breaks in your article (end of sentence, end of paragraph). Put them in places where there will be a good break. Thus, your pages will not have exactly X words in each of them, but approximately or up to X, and this will not lead to a break in sentences or paragraphs. Of course, when displaying pages, these break markers are not displayed.

+2

schnaader May 31, '09 at 20:33

source share

You might want to start by breaking the article into an array of paragraphs using the split command: http://www.php.net/split

 $array = split("\n",$articleText);

+1

Travis May 31, '09 at 20:39

source share

This is the best way to manually cut text, because it is not recommended to leave a program that determines where to cut. Sometimes it will be cut immediately after the h2 tag and continue the text on the next page.

This is a simple database structure for this:
article (id, title, time, ...)
article_body (id, article_id, page, body, ...)

SQL query:

 SELECT a.*, ab.body, ab.page FROM article a INNER JOIN article_body ab ON ab.article_id = a.id WHERE a.id = $aricle_id AND ab.page= $page LIMIT 1;

In an application, you can use jQuery to simply add a new text area for another page ...

+1

sasa May 31, '09 at 20:57

source share

Your table may be something like

 CREATE TABLE ArticleText ( INTEGER artId, INTEGER wordNum, INTEGER wordId, PRIMARY KEY (artId, wordNum), FOREIGN KEY (artId) REFERENCES Articles, FOREIGN KEY (wordId) REFERENCES Words )

this, of course, can be very expensive or slow, etc., but you will need some measurements to determine this (since it depends on your DB mechanism). By the way, I hope that it is clear that the article table is just a table with metadata in articles with the artId key, and the Words table is the table of all words in each article with the wordId key (trying to save some space there, identifying already known words when the article is entered , if possible...). One special word should be a “end of paragraph” marker, easily identifiable as such and distinct from every real word.

If you structure your data like this, you get more flexibility when searching on a page, and the page length can be changed, if you like, even by querying it. To get the page:

 SELECT wordText FROM Articles JOIN ArticleText USING (artID) JOIN Words USING (wordID) WHERE wordNum BETWEEN (@pagenum-1)*@pagelength AND @pagenum * @pagelength + @extras AND Articles.artID = @articleid

@pagenum , @pagelength , @extras , @articleid should be inserted into the prepared query during the query (use any syntax of your database and language, for example :extras or numbered parameters or whatever).

So, we get the words @extras beyond the expected end of the page, and then on the client side we check these additional words to make sure that one of them is a marker of the final paragraph - otherwise we will make another request (with different BETWEEN values) to get more.

From the ideal, but considering all the problems that you have identified, it is worth considering. If you can count on a page length that is always, for example, a few out of 100, you can accept a small change to this based on fragments of 100 words (and the Words table, only text stored directly in the line).

+1

Alex martelli May 31, '09 at 22:12

source share

Allows the author to split the article into parts.

Writers know how to make an article interesting and readable by dividing it into logical parts, such as "Part 1-Installation", "Part 2-Configuration", etc. Having an algorithm is a bad solution, imho.

Shredding an article in the wrong place just makes the reader annoyed. Do not do that.

my 2 ¢

/0

+1

0scar Jun 01 '09 at 14:00

source share

artemb · Accepted Answer · 2009-05-31T20:51:28+0000

I would do this by dividing the articles into chuks while saving them. Saving the script would split the article using any rules you created in it, and save each piece in the table as follows:

 CREATE TABLE article_chunks ( article_id int not null, chunk_no int not null, body text }

Then, when you load the article page:

 $sql = "select body from article_chunks where article_id = " .$article_id." and chunk_no=".$page;

Whenever you want to change the logic for dividing articles into pages, you run a script that combines all the pieces and breaks them into sections:

UPDPATE: Providing advice, I suggest that your application is more read intensive than write intensity, which means articles are read more often than they are written.

How to split a long article and save it in the database for easy searching and swapping?

More articles: