PHP Data Services Extract Content from Drupal Database

Archived Content
This article is provided for historical perspective only, and may not reflect current conditions. Please refer to relevant product page for more up-to-date product information and resources.
  • By Dimuthu LNAME
  • 7 Jan, 2009

Background

Drupal is one of the most popular open source content management systems. It has been deployed by many enterprises to run their Web portals and engage with communities. WSO2's SOA developer portal WSO2 OxygenTank  that hosts contents such as project pages, blogs, forums and other community involvements, also runs on Drupal . In addition to the functionality mentioned, Drupal can also be used to extract data for the purpose of calculating certain parameters such as rate of content submission and rate of community involvement  that has great marketing value.

WSO2 Web Services Framework for PHP (WSF/PHP) is an enterprise grade Web service library for PHP developers, with support for the widest WS-* stack. It also comes with an in-built data services capability. WSF/PHP data services capability can be used to develop Web services around a Drupal database, exposing content easily and rapidly, with security and reliability intact.

Requirement

The objective of this solution is to provide an application; that offers information with a marketing value using data available in the WSO2 portal database.

Solution

Our solution contains a data service that exposes data as a Web service, and a Web application that consumes that data service in providing different views of recent trends in content submission and community involvement.

The decision to build a Web service to expose these information was chosen over the alternative of writing a Drupal module, as Web services allow these information to be accessed by non-PHP code and remote applications, with the WS-* support for reliability and the security.

The contents exposed in the data service

  1. Nodes

    Nodes in Drupal can be pages, blog posts, forums etc. A node consists of an identifier, type (whether it is a blog, page or forum), title and content. Nodes mainly provide community feedback on the projects such as new ideas, suggestions and issues. When retrieving multiple number nodes, data services queries would truncate the content to maximum 200 characters to make it faster, shorter - but still sufficient to have an idea about the content of the node.

  2. Comments

    Comments are community feedback for a give forum topic or a blog post. Comments also represent community suggestions and issues.

  3. Users

    Users form the community. It is important to track users, as it explains the development of the community and helps marketing be informed of those users who has the potential to become paid customers.

The complete list of operation in the data service

WSO2 WSF/PHP can be been used to wrap SQL queried as Web service operations. Here are the list of the operations implemented in our solution:

  1. Get Nodes - Returns all nodes in a Drupal system.
  2. Get Comments - Returns all the comments
  3. Get Comment by Node Id - Returns all comments for a given node id
  4. Get Node by Id - Returns node attributes for a given node id
  5. Get Comment by Comment Id - Return comment attributes for a given comment id
  6. Get Nodes by Duration - Returns nodes submitted in a given duration
  7. Get Comments by Duration - Returns comments submitted in a given duration
  8. Get Users Posted by Duration - Returns users who have submitted nodes in a given duration
  9. Get Nodes Posted by User by Duration - Returns nodes submitted by a user in a given duration
  10. Get Comments Posted by User by Duration - Returns comments posted by a user in a given duration
  11. Get Users Commented by Duration - Returns users who have commented in a given duration
  12. Get Nodes Count by Duration - Return nodes count submitted in a given duration
  13. Get Comments Count by Duration - Returns comments count submitted in a given duration
  14. Get Users Posted Count by Duration - Returns user count who have submitted nodes in a given duration
  15. Get Users Commented Count by Duration - Returns user count on how many users commented within a given duration
  16. Get Nodes Count Per Hour - Returns node count per hour for the last 24 hours
  17. Get Nodes Count Per Day - Returns the nodes count per day for the last 30 days
  18. Get Nodes Count Per Month - Returns the nodes count per month for the last 12 months
  19. Get Nodes Count Per Year - Returns node count per year for the last 5 years
  20. Get Comments Count Per Hour - Returns comment count per hour for the last 24 hours
  21. Get Comments Count Per Day - Returns comment count per day for the last 30 days
  22. Get Comments Count Per Month - Returns comment count per month for the last 12 months
  23. Get Comments Count Per Year - Returns comment count per year for the last 5 years

The above set of operation can be classified under the following group:

Retrieving general information about content

  1. Get Nodes
  2. Get Comments
  3. Get Comment by Node Id
  4. Get Node by Id
  5. Get Comment by Comment Id

This set of operations targets clients who needs all information for a detailed analysis. This has only very little marketing value, as it provides lots of data without a prior agreement:

Retrieving the content for the given duration

  1. Get Nodes by Duration
  2. Get Comments by Duration
  3. Get Users Posted by Duration
  4. Get Nodes Posted by User by Duration
  5. Get Comments Posted by User by Duration
  6. Get Users Commented by Duration

This operation allows clients to get the information within a preferred duration. If the selected duration is last hour or last 24 hours, it will give the updates on the content in the recent past. The people involved in marketing can use these data to do a detailed analysis of the trends and latest events happening around the community and the projects.

Retrieving the count of the content for the given duration

  1. Get Nodes Count by Duration
  2. Get Comments Count by Duration
  3. Get Users Posted Count by Duration
  4. Get Users Commented Count by Duration

These operations provide a count of the content submitted for a given duration. This can be used as a parameter for quick analysis of the growth of community and its activities.

Retrieving the series of values representing the count of the content created in near past

  1. Get Nodes Count Per Hour
  2. Get Nodes Count Per Day
  3. Get Nodes Count Per Month
  4. Get Nodes Count Per Year
  5. Get Comments Count Per Hour
  6. Get Comments Count Per Day
  7. Get Comments Count Per Month
  8. Get Comments Count Per Year

This will provide a detailed statistics of the community involvement and the project popularity over a given time. We can create a graphical view of these statistics in a graph to observe the trend patterns over a proffered period using these data. These can be directly used to decision making involved with marketing and project management.

The Marketing Application

The application for marketing provides a user interface for displaying data retrieved using the data services mentioned mentioned earlier. It consists of two representations of such data:

  1. Textual Representation
  2. Graphical Representation

Textual Representation

This shows a list of content that is of interest to the viewer. The content can be chosen from among the above listed categories.

i.e.

  1. The nodes submitted
  2. The comments submitted
  3. The users who submitted them

The content could be constrained to fall into durations under the following categories:

  1. Last Day
  2. Last Month
  3. Last Year
  4. Last 5 Years

Graphical Representation

The information from the 4th category of operations is used to provide a graphical view. It shows graphs for,

  1. The nodes
  2. The comments

Similar to the 'textual representation', it allows the viewer to choose the duration of the information.

Future Plans

Since we already have Web services hosting data in Drupal databases, we would be able to provide different views of this data in the marketing mashup, which uses the WSO2 Mashup Server. It will allow marketing people to access the information provided by this application from their mashup dashboard itself.

Summary

WSO2 WSF/PHP data services allows extracting out the data which has a great marketing value from the Drupal database in WSO2.org Web portal, and expose it as web services. PHP or any other framework that support web services can consume these services, and provide views for marketing people to access these information.

Author

Dimuthu Gamage is a Software Engineer at WSO2. dimuthu at wso2 dot com