Questions tagged [statsmodels]

Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests.

Filter by
Sorted by
Tagged with
135 votes
6 answers
279k views

Run an OLS regression with Pandas Data Frame

I have a pandas data frame and I would like to able to predict the values of column A from the values in columns B and C. Here is a toy example: import pandas as pd df = pd.DataFrame({"A": [10,20,30,...
Michael's user avatar
  • 13.6k
122 votes
7 answers
86k views

Weighted standard deviation in NumPy

numpy.average() has a weights option, but numpy.std() does not. Does anyone have suggestions for a workaround?
YGA's user avatar
  • 9,836
89 votes
13 answers
91k views

ValueError: numpy.dtype has the wrong size, try recompiling

I just installed pandas and statsmodels package on my python 2.7 When I tried "import pandas as pd", this error message comes out. Can anyone help? Thanks!!! numpy.dtype has the wrong size, try ...
Amber Chen's user avatar
88 votes
10 answers
97k views

auto.arima() equivalent for python

I am trying to predict weekly sales using ARMA ARIMA models. I could not find a function for tuning the order(p,d,q) in statsmodels. Currently R has a function forecast::auto.arima() which will tune ...
Ajax's user avatar
  • 1,719
65 votes
5 answers
84k views

Pythonic way of detecting outliers in one dimensional observation data

For the given data, I want to set the outlier values (defined by 95% confidense level or 95% quantile function or anything that is required) as nan values. Following is the my data and code that I am ...
user avatar
65 votes
9 answers
120k views

Variance Inflation Factor in Python

I'm trying to calculate the variance inflation factor (VIF) for each column in a simple dataset in python: a b c d 1 2 4 4 1 2 6 3 2 3 7 4 3 2 8 5 4 1 9 4 I have already done this in R using the ...
Nizag's user avatar
  • 939
63 votes
7 answers
107k views

confidence and prediction intervals with StatsModels

I do this linear regression with StatsModels: import numpy as np import statsmodels.api as sm from statsmodels.sandbox.regression.predstd import wls_prediction_std n = 100 x = np.linspace(0, 10, n) ...
F.N.B's user avatar
  • 1,559
57 votes
6 answers
82k views

Why do I get only one parameter from a statsmodels OLS fit

Here is what I am doing: $ python Python 2.7.6 (v2.7.6:3a1db0d2747e, Nov 10 2013, 00:42:54) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin >>> import statsmodels.api as sm >>&...
Tom's user avatar
  • 2,859
50 votes
5 answers
142k views

Building multi-regression model throws error: `Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).`

I have pandas dataframe with some categorical predictors (i.e. variables) as 0 & 1, and some numeric variables. When I fit that to a stasmodel like: est = sm.OLS(y, X).fit() It throws: Pandas ...
Sanoj's user avatar
  • 1,417
47 votes
11 answers
60k views

Where can I find mad (mean absolute deviation) in scipy?

It seems scipy once provided a function mad to calculate the mean absolute deviation for a set of numbers: http://projects.scipy.org/scipy/browser/trunk/scipy/stats/models/utils.py?rev=3473 However, ...
Ton van den Heuvel's user avatar
46 votes
5 answers
45k views

Print 'std err' value from statsmodels OLS results

(Sorry to ask but http://statsmodels.sourceforge.net/ is currently down and I can't access the docs) I'm doing a linear regression using statsmodels, basically: import statsmodels.api as sm model = ...
Gabriel's user avatar
  • 41.8k
46 votes
5 answers
128k views

How to extract the regression coefficient from statsmodels.api?

result = sm.OLS(gold_lookback, silver_lookback ).fit() After I get the result, how can I get the coefficient and the constant? In other words, if y = ax + c how to get the values a and c?
JOHN's user avatar
  • 1,461
44 votes
9 answers
64k views

Converting statsmodels summary object to Pandas Dataframe

I am doing multiple linear regression with statsmodels.formula.api (ver 0.9.0) on Windows 10. After fitting the model and getting the summary with following lines i get summary in summary object ...
Sagun Kayastha's user avatar
44 votes
7 answers
181k views

ImportError: No module named statsmodels

I downloaded the StatsModels source from this location. Then untarred to /usr/local/lib/python2.7/dist-packages and per this documentation, did this sudo python setup.py install It installed but ...
Stripers247's user avatar
  • 2,285
42 votes
3 answers
48k views

What's the difference between pandas ACF and statsmodel ACF?

I'm calculating the Autocorrelation Function for a stock's returns. To do so I tested two functions, the autocorr function built into Pandas, and the acf function supplied by statsmodels.tsa. This is ...
BML91's user avatar
  • 3,092
41 votes
7 answers
22k views

Highest Posterior Density Region and Central Credible Region

Given a posterior p(Θ|D) over some parameters Θ, one can define the following: Highest Posterior Density Region: The Highest Posterior Density Region is the set of most probable values of Θ that, in ...
Amelio Vazquez-Reina's user avatar
39 votes
4 answers
23k views

Using statsmodel estimations with scikit-learn cross validation, is it possible?

I am looking for a way I can use the fit object (result) obtained from python statsmodel to feed into cross_val_score of scikit-learn cross_validation method? The attached link suggests that it may be ...
CARTman's user avatar
  • 747
38 votes
1 answer
41k views

ANOVA in python using pandas dataframe with statsmodels or scipy?

I want to use the Pandas dataframe to breakdown the variance in one variable. For example, if I have a column called 'Degrees', and I have this indexed for various dates, cities, and night vs. day, I ...
wolfsatthedoor's user avatar
36 votes
1 answer
10k views

How to silence statsmodels.fit() in python

When I want to fit some model in python, I often use fit() method in statsmodels. And some cases I write a script for automating fitting: import statsmodels.formula.api as smf import pandas as pd df =...
keisuke's user avatar
  • 2,233
35 votes
2 answers
15k views

Pandas rolling regression: alternatives to looping

I got good use out of pandas' MovingOLS class (source here) within the deprecated stats/ols module. Unfortunately, it was gutted completely with pandas 0.20. The question of how to run rolling OLS ...
Brad Solomon's user avatar
  • 39.7k
33 votes
3 answers
7k views

What are the pitfalls of using Dill to serialise scikit-learn/statsmodels models?

I need to serialise scikit-learn/statsmodels models such that all the dependencies (code + data) are packaged in an artefact and this artefact can be used to initialise the model and make predictions. ...
Nikhil's user avatar
  • 2,270
32 votes
3 answers
12k views

Confidence interval for LOWESS in Python

How would I calculate the confidence intervals for a LOWESS regression in Python? I would like to add these as a shaded region to the LOESS plot created with the following code (other packages than ...
pir's user avatar
  • 5,705
31 votes
2 answers
131k views

Why am I getting "LinAlgError: Singular matrix" from grangercausalitytests?

I am trying to run grangercausalitytests on two time series: import numpy as np import pandas as pd from statsmodels.tsa.stattools import grangercausalitytests n = 1000 ls = np.linspace(0, 2*np.pi, ...
Stefan Falk's user avatar
  • 24.7k
31 votes
3 answers
35k views

OLS Regression: Scikit vs. Statsmodels? [closed]

Short version: I was using the scikit LinearRegression on some data, but I'm used to p-values so put the data into the statsmodels OLS, and although the R^2 is about the same the variable coefficients ...
Nat Poor's user avatar
  • 451
30 votes
3 answers
17k views

statsmodels linear regression - patsy formula to include all predictors in model

Say I have a dataframe (let's call it DF) where y is the dependent variable and x1, x2, x3 are my independent variables. In R I can fit a linear model using the following code, and the . will include ...
Greg's user avatar
  • 7,021
30 votes
2 answers
62k views

How to plot statsmodels linear regression (OLS) cleanly

Problem Statement: I have some nice data in a pandas dataframe. I'd like to run simple linear regression on it: Using statsmodels, I perform my regression. Now, how do I get my plot? I've tried ...
Alex Lenail's user avatar
  • 13.8k
29 votes
2 answers
32k views

Capturing high multi-collinearity in statsmodels

Say I fit a model in statsmodels mod = smf.ols('dependent ~ first_category + second_category + other', data=df).fit() When I do mod.summary() I may see the following: Warnings: [1] The condition ...
Amelio Vazquez-Reina's user avatar
29 votes
3 answers
20k views

Python statistics package: difference between statsmodel and scipy.stats [closed]

I need some advice on selecting statistics package for Python, I've done quite some search, but not sure if I get everything right, specifically on the differences between statsmodels and scipy.stats. ...
herrfz's user avatar
  • 4,864
27 votes
3 answers
26k views

python stats models - quadratic term in regression

I have the following linear regression: import statsmodels.formula.api as sm model = sm.ols(formula = 'a ~ b + c', data = data).fit() I want to add a quadratic term for b in this model. Is there a ...
datavoredan's user avatar
  • 3,676
27 votes
3 answers
67k views

How to get the P Value in a Variable from OLSResults in Python?

The OLSResults of df2 = pd.read_csv("MultipleRegression.csv") X = df2[['Distance', 'CarrierNum', 'Day', 'DayOfBooking']] Y = df2['Price'] X = add_constant(X) fit = sm.OLS(Y, X).fit() print(fit....
Addzy K's user avatar
  • 715
26 votes
2 answers
39k views

Error: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting

So I have a CSV file with two columns: date and price, but when I tried to use ARIMA on that time series I encountered this error: ValueWarning: A date index has been provided, but it has no ...
Dorki's user avatar
  • 1,121
25 votes
3 answers
60k views

Fixed effect in Pandas or Statsmodels

Is there an existing function to estimate fixed effect (one-way or two-way) from Pandas or Statsmodels. There used to be a function in Statsmodels but it seems discontinued. And in Pandas, there is ...
user3576212's user avatar
  • 3,395
25 votes
1 answer
16k views

Python statsmodels ARIMA LinAlgError: SVD did not converge

Background: I'm developing a program using statsmodels that fits 27 arima models (p,d,q=0,1,2) to over 100 variables and chooses the model with the lowest aic and statistically significant t-...
asdf's user avatar
  • 846
24 votes
4 answers
52k views

ImportError: cannot import name 'factorial'

I want to use a logit model and trying to import statsmodels library. My Version: Python 3.6.8 The best suggestion I got is to downgrade scipy but unclear how to and to what version should I ...
Bhavya Geethika's user avatar
24 votes
2 answers
28k views

Python statsmodels OLS: how to save learned model to file

I am trying to learn an ordinary least squares model using Python's statsmodels library, as described here. sm.OLS.fit() returns the learned model. Is there a way to save it to the file and reload it?...
Nik's user avatar
  • 5,635
24 votes
2 answers
27k views

What statistics module for python supports one way ANOVA with post hoc tests (Tukey, Scheffe or other)?

I have tried looking through multiple statistics modules for Python but can't seem to find any that support one-way ANOVA post hoc tests.
david_adler's user avatar
  • 10.4k
24 votes
3 answers
13k views

Any Python Library Produces Publication Style Regression Tables

I've been using Python for regression analysis. After getting the regression results, I need to summarize all the results into one single table and convert them to LaTex (for publication). Is there ...
Titanic's user avatar
  • 557
24 votes
2 answers
54k views

How to get the regression intercept using Statsmodels.api

I am trying calculate a regression output using python library but I am unable to get the intercept value when I use the library: import statsmodels.api as sm It prints all the regression analysis ...
Shank's user avatar
  • 675
24 votes
7 answers
39k views

Predicting on new data using locally weighted regression (LOESS/LOWESS)

How to fit a locally weighted regression in python so that it can be used to predict on new data? There is statsmodels.nonparametric.smoothers_lowess.lowess, but it returns the estimates only for the ...
max's user avatar
  • 50.8k
22 votes
5 answers
22k views

Changing fig size with statsmodel

I am trying to make QQ-plots using the statsmodel package. However, the resolution of the figure is so low that I could not possibly use the results in a presentation. I know that to make networkX ...
mlg4080's user avatar
  • 423
22 votes
5 answers
67k views

Decomposing trend, seasonal and residual time series elements

I have a DataFrame with a few time series: divida movav12 var varmovav12 Date 2004-01 0 NaN NaN NaN 2004-02 ...
aabujamra's user avatar
  • 4,564
22 votes
3 answers
60k views

logit regression and singular Matrix error in Python

am trying to run logit regression for german credit data (www4.stat.ncsu.edu/~boos/var.select/german.credit.html). To test the code, I have used only numerical variables and tried regressing it with ...
user3122731's user avatar
22 votes
2 answers
27k views

Statsmodels ARIMA - Different results using predict() and forecast()

I use ARIMA from statsmodels package in order to predict values from a series: plt.plot(ind, final_results.predict(start=0 ,end=26)) plt.plot(ind, forecast.values) plt.show() I thought that I would ...
Simone's user avatar
  • 4,890
22 votes
2 answers
7k views

Difference in Python statsmodels OLS and R's lm

I'm not sure why I'm getting slightly different results for a simple OLS, depending on whether I go through panda's experimental rpy interface to do the regression in R or whether I use statsmodels in ...
Skylar Saveland's user avatar
22 votes
3 answers
14k views

Understanding output from statsmodels grangercausalitytests

I'm new to Granger Causality and would appreciate any advice on understanding/interpreting the results of the python statsmodels output. I've constructed two data sets (sine functions shifted in time ...
Wilhelm's user avatar
  • 363
21 votes
2 answers
47k views

Holt-Winters time series forecasting with statsmodels

I tried forecasting with holt-winters model as shown below but I keep getting a prediction that is not consistent with what I expect. I also showed a visualization of the plot Train = Airline[:130] ...
Mujeebla's user avatar
  • 213
21 votes
2 answers
57k views

Linear regression with dummy/categorical variables

I have a set of data. I have use pandas to convert them in a dummy and categorical variables respectively. So, now I want to know, how to run a multiple linear regression (I am using statsmodels) in ...
Héctor Alonso's user avatar
21 votes
3 answers
18k views

Specifying which category to treat as the base with 'statsmodels'

In understand that when I have a category variable in a model passed to a statsmodels fit that dummy variables will automatically be generated for the categories. For example if I have a variable '...
orome's user avatar
  • 47k
21 votes
2 answers
9k views

Poisson Regression in statsmodels and R

Given the some randomly generated data with 2 columns, 50 rows and integer range between 0-100 With R, the poisson glm and diagnostics plot can be achieved as such: > col=2 > row=50 > ...
alvas's user avatar
  • 119k
20 votes
1 answer
50k views

ValueWarning: No frequency information was provided, so inferred frequency MS will be used

I try to fit Autoregression by sm.tsa.statespace.SARIMAX. But I meet a warning, then I want to set frequency information for this model. Who used to meet it, can you help me ? fit1 = sm.tsa....
Lê Ngọc Thạch's user avatar

1
2 3 4 5
58