You can implement controlled LDA with PyMC, which uses the Metropolis probe to examine hidden variables in the following graphical model: 
The academic building consists of 10 film reviews (5 positive and 5 negative), together with an associated star rating for each document. Star rating is known as the response variable of interest for each document. Documents and response variables are modeled together to find hidden topics that best predict response variables for future unlabeled documents. See the original paper for more information . Consider the following code:
import pymc as pm import numpy as np from sklearn.feature_extraction.text import TfidfVectorizer train_corpus = ["exploitative and largely devoid of the depth or sophistication ", "simplistic silly and tedious", "it so laddish and juvenile only teenage boys could possibly find it funny", "it shows that some studios firmly believe that people have lost the ability to think", "our culture is headed down the toilet with the ferocity of a frozen burrito", "offers that rare combination of entertainment and education", "the film provides some great insight", "this is a film well worth seeing", "a masterpiece four years in the making", "offers a breath of the fresh air of true sophistication"] test_corpus = ["this is a really positive review, great film"] train_response = np.array([3, 1, 3, 2, 1, 5, 4, 4, 5, 5]) - 3 #LDA parameters num_features = 1000 #vocabulary size num_topics = 4 #fixed for LDA tfidf = TfidfVectorizer(max_features = num_features, max_df=0.95, min_df=0, stop_words = 'english') #generate tf-idf term-document matrix A_tfidf_sp = tfidf.fit_transform(train_corpus) #size D x V print "number of docs: %d" %A_tfidf_sp.shape[0] print "dictionary size: %d" %A_tfidf_sp.shape[1] #tf-idf dictionary tfidf_dict = tfidf.get_feature_names() K = num_topics # number of topics V = A_tfidf_sp.shape[1] # number of words D = A_tfidf_sp.shape[0] # number of documents data = A_tfidf_sp.toarray() #Supervised LDA Graphical Model Wd = [len(doc) for doc in data] alpha = np.ones(K) beta = np.ones(V) theta = pm.Container([pm.CompletedDirichlet("theta_%s" % i, pm.Dirichlet("ptheta_%s" % i, theta=alpha)) for i in range(D)]) phi = pm.Container([pm.CompletedDirichlet("phi_%s" % k, pm.Dirichlet("pphi_%s" % k, theta=beta)) for k in range(K)]) z = pm.Container([pm.Categorical('z_%s' % d, p = theta[d], size=Wd[d], value=np.random.randint(K, size=Wd[d])) for d in range(D)]) @pm.deterministic def zbar(z=z): zbar_list = [] for i in range(len(z)): hist, bin_edges = np.histogram(z[i], bins=K) zbar_list.append(hist / float(np.sum(hist))) return pm.Container(zbar_list) eta = pm.Container([pm.Normal("eta_%s" % k, mu=0, tau=1.0/10**2) for k in range(K)]) y_tau = pm.Gamma("tau", alpha=0.1, beta=0.1) @pm.deterministic def y_mu(eta=eta, zbar=zbar): y_mu_list = [] for i in range(len(zbar)): y_mu_list.append(np.dot(eta, zbar[i])) return pm.Container(y_mu_list) #response likelihood y = pm.Container([pm.Normal("y_%s" % d, mu=y_mu[d], tau=y_tau, value=train_response[d], observed=True) for d in range(D)]) # cannot use p=phi[z[d][i]] here since phi is an ordinary list while z[d][i] is stochastic w = pm.Container([pm.Categorical("w_%i_%i" % (d,i), p = pm.Lambda('phi_z_%i_%i' % (d,i), lambda z=z[d][i], phi=phi: phi[z]), value=data[d][i], observed=True) for d in range(D) for i in range(Wd[d])]) model = pm.Model([theta, phi, z, eta, y, w]) mcmc = pm.MCMC(model) mcmc.sample(iter=1000, burn=100, thin=2) #visualize topics phi0_samples = np.squeeze(mcmc.trace('phi_0')[:]) phi1_samples = np.squeeze(mcmc.trace('phi_1')[:]) phi2_samples = np.squeeze(mcmc.trace('phi_2')[:]) phi3_samples = np.squeeze(mcmc.trace('phi_3')[:]) ax = plt.subplot(221) plt.bar(np.arange(V), phi0_samples[-1,:]) ax = plt.subplot(222) plt.bar(np.arange(V), phi1_samples[-1,:]) ax = plt.subplot(223) plt.bar(np.arange(V), phi2_samples[-1,:]) ax = plt.subplot(224) plt.bar(np.arange(V), phi3_samples[-1,:]) plt.show()
Given the learning data (observable words and response variables), we can find out global topics (beta) and regression coefficients (eta) to predict the response variable (Y) in addition to the theme proportions for each document (theta). To make predictions of Y based on the studied beta and eta, we can define a new model where we do not observe Y and use the previously studied beta and eta to get the following result:

Here we predicted a positive review (approximately 2 given a range of ratings from -2 to 2) for a test case consisting of one sentence: “this is really a positive review, a great movie”, as shown by the back histogram mode on the right. For a full implementation, see ipython notebook .