I work in the travel industry as an architect / software project on exactly the same project that you describe - in our region we work directly with suppliers, but for outgoing we connect to several aggregators.
To answer your question ... some data you have, some you get in different ways, and some you have to torture and twist, until it is recognized.
What is your angle?
The questions you need to ask are ... Do you want to sell ads like Kayak, or do you take an abbreviation like Expedia? Are you looking for or selling travel services? Do you focus on a niche (for example, only on air transportation) or everything (accommodation, airlines, car rental, additional services such as transport / excursions / conferences, etc.)? Are you targeting a region (USA or part of the USA) or the world? How deep do you go - do you just display multiple sites on one screen or combine different services and dynamically pack them?
Data retrieval
If you are going with the Kayak business model, you technically do not need permission to the site ... but many sites have affiliate programs with IFrames or other simple ways to direct customers to their site. On the plus side, you donβt have to deal with payments / complaints and travelers themselves. As for the minuses ... if you want to compare prices yourself and present the cheapest option for the user, you will have to integrate at a deeper level, which means API and web scraping.
As for web scraping ... avoid this. This sucks. Indeed. Just don't do it. Believe me in that. For example, some things like lowcosters that you cannot get without web scraping. Low cost airlines live on additional services. If the user does not see their site, they do not sell unnecessary things, and they earn nothing. Therefore, they do not have branches, they do not offer an API, and they change their site layout almost constantly. However, there are companies that make a living by web scrambling low-level sites and complete them in nice APIs. If you can afford it, you can give your users a comparison of the costs of low-cost flights and huge ones.
On the other hand, there are "normal" media that offer an API. This is not such a big problem to get into the airline, since they are all integrated under IATA ; basically, you buy from IATA, and IATA distributes the money to carriers. However, you probably do not want to connect directly to the network of telecom operators. These days they have web services and SOAP, but believe me, when I say that there are SOAP protocols that are just insanely thin wrappers around a text prompt through which you can interact with the mainframe with the 80s protocol (think about Unix tell me where you are counting for a team, and it takes about 20 teams for one search). That's why you probably want to connect to someone else a bit along the product chain, with a better API.
Airlines, therefore, are at both extremes of the Gaussian curve; on the one hand, there are individual suppliers, and on the other, centralized systems in which you implement one API, and you can fly anywhere in the world. Accommodation and the rest of the tourist products are located between them. There are several major players who combine hotels and a ton of small suppliers with a multitude of aggregators that cover only part of the spectrum. For example, you can rent a lighthouse, and itβs not even so expensive, but you canβt compare the prices of different lighthouses in one place.
If you work in the Kayak business model, you are likely to end up scraping websites. If you want to integrate different vendors, you'll often work with APIs, some of which are pretty good, and most of them are acceptable. I have not worked with RSS, but there are not many differences between RSS and web scraping. There is also a fourth option not mentioned in Jeff's answer ... one where you get your data at night, for example. CSV files via FTP, etc.
Life sucks (mini flatulence)
And then the complexity. The more values ββyou want to add, the more complexity you will have to handle. Can you find housing that allows pets? For a hostel located less than 5 km from the city center? Are you planning to fly, and can you guarantee that the traveler will have enough time to get from one airport to another ... Can you sell transport in advance? The famous cellist does not want to part with his precious 18th-century cello; can you sell him another place for a cello (yes, without doing it)?
Want to compare prices? Of course, the room is 30 euros per night. But you can get one double for 30 and one single for 20, or you can get one extra bed twice and get a 70% discount for a third person. But only if it is a child under the age of 12; Our extra beds are not suitable for adults. And you will not get the price for an extra bed in the search results - only when calculating the final price.
And donβt even make me work with dynamic packaging. Want to sell your home + rent a car? No problems; integrate with two different providers, and you leave ... manually updating the list of places in the city (from the car lessor) in accordance with the hotels (from the provider, which gives you only the city for each hotel). Of course, provided that you have already compared the list of cities with two, as there is no international standard for city codes.
Unlike many other industries that have many products, the tourism industry has many very complex products. Amazon has it easy; selling books and selling potatoes is the same; You can even send them in one box. They are easily combined and are not assembled from many parts. :)
PS Link to an interesting recent thread in Hacker News with some insider information about flights . PPS Recently I came across a large, albeit rather old, blogpost on the IATA NDC protocol with an overview of how the travel industry is connected, and a history lesson on how it became .